120,99 €
"There is nothing like it on the market...no others are as encyclopedic...the writing is exemplary: simple, direct, and competent." --George W. Cobb, Professor Emeritus of Mathematics and Statistics, Mount Holyoke College Written in a direct and clear manner, Classic Topics on the History of Modern Mathematical Statistics: From Laplace to More Recent Times presents a comprehensive guide to the history of mathematical statistics and details the major results and crucial developments over a 200-year period. Presented in chronological order, the book features an account of the classical and modern works that are essential to understanding the applications of mathematical statistics. Divided into three parts, the book begins with extensive coverage of the probabilistic works of Laplace, who laid much of the foundations of later developments in statistical theory. Subsequently, the second part introduces 20th century statistical developments including work from Karl Pearson, Student, Fisher, and Neyman. Lastly, the author addresses post-Fisherian developments. Classic Topics on the History of Modern Mathematical Statistics: From Laplace to More Recent Times also features: * A detailed account of Galton's discovery of regression and correlation as well as the subsequent development of Karl Pearson's X² and Student's t * A comprehensive treatment of the permeating influence of Fisher in all aspects of modern statistics beginning with his work in 1912 * Significant coverage of Neyman-Pearson theory, which includes a discussion of the differences to Fisher's works * Discussions on key historical developments as well as the various disagreements, contrasting information, and alternative theories in the history of modern mathematical statistics in an effort to provide a thorough historical treatment Classic Topics on the History of Modern Mathematical Statistics: From Laplace to More Recent Times is an excellent reference for academicians with a mathematical background who are teaching or studying the history or philosophical controversies of mathematics and statistics. The book is also a useful guide for readers with a general interest in statistical inference.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 1213
Veröffentlichungsjahr: 2016
COVER
TITLE PAGE
PREFACE
ACKNOWLEDGMENTS
INTRODUCTION: LANDMARKS IN PRE-LAPLACEAN STATISTICS
PART ONE: LAPLACE
1 THE LAPLACEAN REVOLUTION
1.1 PIERRE-SIMON DE LAPLACE (1749–1827)
1.2 LAPLACE'S WORK IN PROBABILITY AND STATISTICS
1.3 THE PRINCIPLE OF INDIFFERENCE
1.4 FOURIER TRANSFORMS, CHARACTERISTIC FUNCTIONS, AND CENTRAL LIMIT THEOREMS
1.5 LEAST SQUARES AND THE NORMAL DISTRIBUTION
PART TWO: FROM GALTON TO FISHER
2 GALTON, REGRESSION, AND CORRELATION
2.1 FRANCIS GALTON (1822–1911)
2.2 GENESIS OF REGRESSION AND CORRELATION
2.3 FURTHER DEVELOPMENTS AFTER GALTON
2.4 WORK ON CORRELATION AND THE BIVARIATE (AND MULTIVARIATE) NORMAL DISTRIBUTION BEFORE GALTON
3 KARL PEARSON’S CHI-SQUARED GOODNESS-OF-FIT TEST
3.1 KARL PEARSON (1857–1936)
3.2 ORIGIN OF PEARSON'S CHI-SQUARED
3.3 PEARSON'S ERROR AND CLASH WITH FISHER
3.4 THE CHI-SQUARED DISTRIBUTION BEFORE PEARSON
4 STUDENT’S
t
4.1 WILLIAM SEALY GOSSET (1876–1937)
4.2 ORIGIN OF STUDENT’S TEST: THE 1908 PAPER
4.3 FURTHER DEVELOPMENTS
4.4 STUDENT ANTICIPATED
5 THE FISHERIAN LEGACY
5.1 RONALD AYLMER FISHER (1890–1962)
5.2 FISHER AND THE FOUNDATION OF ESTIMATION THEORY
5.3 FISHER AND SIGNIFICANCE TESTING
5.4 ANOVA AND THE DESIGN OF EXPERIMENTS
5.5 FISHER AND PROBABILITY
5.6 FISHER VERSUS NEYMAN–PEARSON: CLASH OF THE TITANS
5.7 MAXIMUM LIKELIHOOD BEFORE FISHER
5.8 SIGNIFICANCE TESTING BEFORE FISHER
PART THREE: FROM DARMOIS TO ROBBINS
6 BEYOND FISHER AND NEYMAN–PEARSON
6.1 EXTENSIONS TO THE THEORY OF ESTIMATION
6.2 ESTIMATION AND HYPOTHESIS TESTING UNDER A SINGLE FRAMEWORK: WALD'S STATISTICAL DECISION THEORY (1950)
6.3 THE BAYESIAN REVIVAL
REFERENCES
INDEX
END USER LICENSE AGREEMENT
Chapter 02
Table 2.1 Table of Attributes for Yule’s Coefficient of Association
Chapter 03
Table 3.1 Roulette and Coin-Tossing Results
Table 3.2 Weldon's Data On Frequencies of Dice Showing 5 or 6 Points When 12 Dice Are Cast 26,306 Times
Table 3.3 2 × 2 Table in Greenwood and Yule's Paper
Chapter 04
Table 4.1 Hours of Sleep Gained by the Use of Hyoscyamine Hydrobromide
Chapter 05
Table 5.1 The Pearson System of Curves
Table 5.2 Table Used by Fisher to Illustrate the Concept of Ancillarity
Table 5.3 Cross-classification Between two Categorical Variables
Table 5.4 Basu's Example of Nonunique Ancillary Statistics
Table 5.5 Maximum Likelihood Estimator
T
of
θ
Table 5.6 The Six Ancillary Statistics
Y
1
, …,
Y
6
in Basu's Example
Table 5.7 Correct ANOVA Table for Fisher and Mackenzie's Split–Split Plot Design
Table 5.8 Incorrect ANOVA Table from Fisher and Mackenzie's “Studies in crop variation. II. The manurial response of different potato varieties”
Table 5.9 Data for First ANOVA Example in Fisher's
Statistical Methods for Research Workers
Table 5.10 Table for First ANOVA Example in Fisher's
Statistical Methods for Research Workers
Table 5.11 First ANOVA Table from
Statistical Methods for Research Workers
for the Potato Example
Table 5.12 Second ANOVA Table from
Statistical Methods for Research Workers
for the Potato Example
Table 5.13 First ANOVA Table for Regression Example in Fisher's
Statistical Methods for Research Workers
Table 5.14 Second ANOVA Table for Regression Example in Fisher's
Statistical Methods for Research Workers
Table 5.15 Anova Table for Multiple Linear Regression
Table 5.16 Data for Completely Randomized Design
Table 5.17 One-way ANOVA for Completely Randomized Design
Table 5.18 Data for Randomized Block Experiment
Table 5.19 Two-way ANOVA for Randomized Block Design
Table 5.20 Data for Latin Square Design
Table 5.21 ANOVA Table for Latin Square
Table 5.22 Preliminary Yields of Tea Plots
Table 5.23 Experimental Yields of Tea Plots
Table 5.24 Analysis of Experimental Yields
Table 5.25 Analysis of Preliminary Yields
Table 5.26 Sum of Squares and Products (
x
and
y
are Deviations from Their Respective Row Means for “Rows” and Deviations from Their Respective Column Means for “Columns”)
Table 5.27 ANCOVA Table
Table 5.28 Student's ANOVA Table for the ABBA Design
Table 5.29 ABBA Design with 16 Strips and 12 Sections per Strip used by Barbacki and Fisher (1936, p. 190)
Table 5.30 Differences (A − B) for the Design in Table 5.29
Table 5.31 Differences (A − B − B + A) for one Particular Randomization of Half-Drill Strips
Table 5.32 ANOVA Table for ABBA Design
Table 5.33 Part of Table Used by Fisher to Illustrate Fiducial Intervals for
ρ
Table 5.34 2 × 2 Table for Barnard's Example
Table 5.35 Two Uninformative Tables that are Used in Barnard's Unconditional Test
Table 5.36 Christenings in London for 82 Consecutive Years
Chapter 06
Table 6.1 Correspondence between Two-Person Game and Decision Problem
Table 6.2 Utilities (
U
1
… U
4
) of Outcomes for Two Possible Actions (
α
,
β
) Undertaken under Two Possible Conditions (
p
, not (~)
p
)
a
Table 6.3 Utilities for a Particular Ethically Neutral Proposition
p
Table 6.4 Utility Implication of Ramsey's Definition of Value Difference
Table 6.5 Calibrating a Subject's Utility Scale (First Estimate)
Table 6.6 Calibrating a Subject's Utility Scale (Second Estimate)
Table 6.7 Ramsey's Definition of the Subjective Probability
P
of a Proposition
p
(Which is not Necessarily Ethically Neutral)
Table 6.8 Ramsey's Definition of Conditional Probability
Table 6.9 Consequences in Egg Example Given by Savage (1954, p. 14)
Chapter 01
Figure 1.1 Pierre-Simon de Laplace (1749–1827).
Figure 1.2 First page of Laplace's “Mémoire sur les suites récurro-récurrentes” (Laplace, 1774b)
Figure 1.3 First page of Laplace's “Mémoire sur la probabilité des causes par les événements” (Laplace, 1774a)
Figure 1.4 Laplace's determination of an appropriate center for three observations
a
,
b
, and
c
Figure 1.5 Laplace's curve of probability for
n
= 2 comets
Figure 1.6 Laplace's curve of probability for
n
= 3 comets
Figure 1.7 Laplace's probability curve for several observations made on
V
when the law of error (
BLMN
) is assumed to be known
Figure 1.8 Laplace's probability curve for several observations when the law of error (
MRYN
) is assumed to be unknown
Figure 1.9 The line
AH
divides the area under
MHN
into two equal halves
Figure 1.10 First page of “Mémoire sur les probabilités” (Laplace, 1781)
Figure 1.11 Marquis de Condorcet (1743–1794).
Figure 1.12 Title page of de Moivre's Miscellanea Analytica (de Moivre, 1730)
Figure 1.13 First page of Lagrange's “Sur une nouvelle espèce de calcul relatif à la différentiation et à l'intégration des quantités variables” (Lagrange, 1774)
Figure 1.14 First page of “Mémoire sur les approximations des formules qui sont fonctions de très grands nombres et sur leur application aux probabilités” (Laplace, 1810a)
Figure 1.15 First page of Bayes’ posthumous paper, “An Essay towards solving a problem in the doctrine of chances” (Bayes, 1764)
Figure 1.16 David Hume (1711–1776).
Figure 1.17 Three of the ways in which a chord can be randomly chosen on a circle
Figure 1.18 Henri Poincaré (1854–1912).
Figure 1.19 Edwin T. Jaynes (1922–1998).
Figure 1.20 Translational invariance in Bertrand's Paradox
Figure 1.21 Jean Baptiste Joseph Fourier (1768–1830).
Figure 1.22 First page of summary by Poisson of Fourier's “Mémoire sur la propagation de la chaleur dans les corps solides” (Fourier, 1808)
Figure 1.23 Joseph-Louis Lagrange (1736–1813).
Figure 1.24 First page of Lagrange's “Mémoire sur l'utilité de la méthode de prendre le milieu entre les résultats de plusieurs observations” (Lagrange, 1776)
Figure 1.25 Siméon Denis Poisson (1781–1840).
Figure 1.26 Aleksandr Mikhailovich Lyapunov (1857–1918).
Figure 1.27 Title page of Lyapunov's 1901 article “Nouvelle forme du théorème sur la limite de probabilité” (Lyapunov, 1901a)
Figure 1.28 Legendre's method of least squares, taken from the
Nouvelles Méthodes
(Legendre, 1805)
Figure 1.29 Robert Adrain (1775–1843).
Figure 1.30 First page of Robert Adrain's 1808 paper “Research concerning the probabilities of the errors which happen in making observations.” (Adrain, 1808)
Figure 1.31 Adrain's second derivation of the normal law
Figure 1.32 Carl Friedrich Gauss (1777–1855).
Figure 1.33 Gauss’ February 28, 1839, letter to Bessel (Gauss and Bessel, 1880)
Chapter 02
Figure 2.1 Francis Galton (1822–1911).
Figure 2.2 First page of Galton’s “Typical laws of heredity” (Galton, 1877)
Figure 2.3 The quincunx (left), the double quincunx (middle), and the convergent quincunx (right) . Only the quincunx was actually constructed
Figure 2.4 First page of Galton’s 1885 Presidential Lecture to the Anthropology Section of the British Association (Galton, 1885a)
Figure 2.5 Galton’s table on stature (Galton, 1885b, p. 248)
Figure 2.6 Galton’s plot of offspring median heights (
Y
) against the midparental heights (
X
) (Galton, 1885b)
Figure 2.7 Galton’s discovery of the bivariate normal distribution (Galton, 1885b)
Figure 2.8 First page of Dickson’s analysis (Galton, 1886, p. 63)
Figure 2.9 First page of Galton’s 1888 paper “Co-relations and their measurements” (Galton, 1888)
Figure 2.10 Walter Frank Raphael Weldon (1860–1906).
Figure 2.11 Francis Ysidro Edgeworth (1845–1926).
Figure 2.12 First page of Edgeworth’s “Correlated averages” (Edgeworth, 1892)
Figure 2.13 First page of Pearson’s “Regression, heredity, and panmixia” (Pearson et al., 1896)
Figure 2.14 George Udny Yule (1871–1951).
Figure 2.15 Title page of first edition of Yule’s
Introduction to the Theory of Statistics
(Yule, 1911)
Figure 2.16 Plot of mean
x
-values (denoted by ×) at given
y
-values for a three-dimensional frequency surface.
Figure 2.17 Plane and table representation for Pearson’s measure of association for attributes (Pearson, 1900a, p. 2)
Figure 2.18 Fisher’s geometric derivation of the exact distribution of the correlation coefficient
Figure 2.19 Projection of
Y
-space onto
X
-space
Figure 2.20 Giovanni Antonio Amedeo Plana (1781–1864).
Figure 2.21 Auguste Bravais (1811–1863).
Figure 2.22 Extract from Bravais’ 1846 memoir (Bravais, 1846, p. 273)
Figure 2.23 Joseph Louis François Bertrand (1822–1900).
Chapter 03
Figure 3.1 Karl Pearson (1857–1936).
Figure 3.2 From Pearson's
The Chances of Death
(Pearson, 1897b, p. 13)
Figure 3.3 First page of Pearson's 1900 paper “On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling” (Pearson, 1900b)
Figure 3.4 Major Greenwood (1880–1949).
Figure 3.5 Irenée-Jules Bienaymé (1796–1878).
Figure 3.6 First page of Bienaymé's article “Sur la probabilité des erreurs d'aprés la méthode des moindres carrés” (Bienaymé, 1852)
Figure 3.7 Ernst Karl Abbe (1840–1905).
Figure 3.8 Title page of Abbe's dissertation (Abbe, 1863)
Figure 3.9 Friedrich Robert Helmert (1843–1917).
Figure 3.10 The transformations (p. 122) as first used by Helmert in his paper “Die Genauigkeit der Formel von Peters zur Berechnung des wahrscheinlichen Beobachtungsfehler direkter Beobachtungen gleicher Genauigkeit” (Helmert, 1876a) to obtain the distribution of [
λ λ
]
Chapter 04
Figure 4.1 William Sealy Gosset (“Student”) (1876–1937).
Figure 4.2 First page of Student’s 1908 paper.
Figure 4.3 Fisher’s geometric derivation of the joint distribution of
and
s
Figure 4.4 Jacob Lüroth (1844–1910).
Figure 4.5 First page of Lüroth’s paper (Lüroth, 1876)
Chapter 05
Figure 5.1 Sir Ronald Aylmer Fisher (1890–1962).
Figure 5.2 Sir Arthur Stanley Eddington (1882–1944).
Figure 5.3 Sir David Roxbee Cox (b. 1924).
Figure 5.4 First page of Fisher and Mackenzie's “Studies in crop variation. II. The manurial response of different potato varieties.”
Figure 5.5 Half-drill strip, Knight's move, and chessboard designs. (a) Half-drill strip (ABBA) design for two treatments (the strips are along the rows and the ABBA “sandwiches” are along the columns), (b) Knight's move with five treatments and five replicates. The name Knight's move comes from the fact that each treatment is repeated by moving one down and two across. This particular design is also known as the Knut Vik Square in honor of the Norwegian Knut Vik who presented it in 1924, and (c) Beaven's chessboard for eight treatments.
Figure 5.6 Maurice Stevenson Bartlett (1910–2002).
Figure 5.7 Sir Harold Jeffreys (1891–1989).
Figure 5.8 Jerzy Neyman (1894–1981).
Figure 5.9 Egon Sharpe Pearson (1885–1980).
Figure 5.10 First page of Neyman and Pearson's 1928 paper
Figure 5.11 Neyman and Pearson's demonstration of the sufficiency of the condition
p
0
≤
kp
1
Figure 5.12 George Alfred Barnard (1915–2002).
Figure 5.13 Johann Heinrich Lambert (1728–1777).
Figure 5.14 Title page of Lambert's
Photometria
Figure 5.15 Lambert's maximum likelihood procedure
Figure 5.16 Daniel Bernoulli
Figure 5.17 Original Latin version of Daniel Bernoulli's article “Diiudicatio maxime probabilis plurium obseruationum discrepantium atque verisimillima inductio inde formanda”
Figure 5.18 Semi‐circular distribution considered by Daniel Bernoulli in his article “Diiudicatio maxime probabilis plurium obseruationum discrepantium atque verisimillima inductio inde formanda”
Figure 5.19 John Arbuthnot
Figure 5.20 First page of Arbuthnot's 1710 memoir
Figure 5.21 Willem Jacob ‘s Gravesande
Figure 5.22 ‘s Gravesande's article on Arbuthnot's Problem, taken from the
Oeuvres Philosophiques et Mathématiques de Mr G.F. ‘s Gravesande
Figure 5.23 Nicholas Bernoulli's comments on the Arbuthnot's Problem, taken from the
Oeuvres Philosophiques et Mathématiques de Mr G.F. ‘s Gravesande
Figure 5.24 Nicholas Bernoulli's classification of the terms before the (
F
+ 1)th term in the binomial expansion of (
M
+
F
)
n
Figure 5.25 French translated version of Daniel Bernoulli's prize‐winning article
Figure 5.26 Jean le Rond d'Alembert
Figure 5.27 Isaac Todhunter
Figure 5.28 First page of Michell's article “An inquiry into the probable parallax, and magnitude of the fixed stars, from the quantity of light which they afford us, and the particular circumstances of their situation”
Figure 5.29 Sir John Frederick William Herschel (1792–1871).
Figure 5.30 James David Forbes (1809–1868).
Chapter 06
Figure 6.1 Georges Darmois (1888–1960).
Figure 6.2 Edwin James George Pitman (1897–1993).
Figure 6.3 Maurice Fréchet (1878–1973). Wikimedia Commons (Public Domain), http://commons.wikimedia.org/wiki/File:Frechet.jpeg
Figure 6.4 Calyampudi Radhakrishna Rao (1920–). Wikimedia Commons (Licensed under the Creative Commons Attribution-Share Alike 4.0 International, 3.0 Unported, 2.5 Generic, 2.0 Generic and 1.0 Generic license), http://commons.wikimedia.org/wiki/File:Calyampudi_Radhakrishna_Rao_at_ISI_Chennai.JPG
Figure 6.5 Harald Cramér (1893–1985). Wikimedia Commons (Licensed under the Creative Commons Attribution-Share Alike 2.0 Germany license), http://commons.wikimedia.org/wiki/File:Harald_Cram%C3%A9r.jpeg
Figure 6.6 David Harold Blackwell (1919–2010). Wikimedia Commons (Licensed under the Creative Commons Attribution-Share Alike 2.0 Germany license), http://commons.wikimedia.org/wiki/File:David_Blackwell.jpg
Figure 6.7 Erich Leo Lehmann (1917–2009).
Figure 6.8 Henry Scheffé (1907–1977). Wikimedia Commons (Free Art License), http://commons.wikimedia.org/wiki/File:Henry_Scheffe.jpeg
Figure 6.9 Abraham Wald (1902–1950). Wikimedia Commons (Public Domain), http://en.wikipedia.org/wiki/File:Abraham_Wald_in_his_youth.jpg
Figure 6.10 Frank Plumpton Ramsey (1903–1930). Wikipedia, http://en.wikipedia.org/wiki/Frank_P._Ramsey
Figure 6.11 Bruno de Finetti (1906–1985).
Figure 6.12 Leonard Jimmie Savage (1917–1971). Book cover of “
The Writings of Leonard Jimmie Savage – A Memorial Selection” by L.J. Savage (1981)
.
Figure 6.13 (a) Savage's second postulate. (b) Illustration of the second postulate: the acts
f
and
g
are modified to
f′
and
g′
, respectively; since both pairs agree on ~
B
, preference between
f
and
g
(and
f′
and
g′
) should be based on
B
, not on the utility values
x
and
y
Figure 6.14 Herbert Ellis Robbins (1915–2001). Wikipedia, http://en.wikipedia.org/wiki/Herbert_Robbins
Cover
Table of Contents
Begin Reading
ii
iii
iv
v
xvi
xvii
xviii
xix
xx
xxi
1
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
373
374
375
376
377
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
458
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
PRAKASH GORROOCHURN
Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY
Copyright © 2016 by John Wiley & Sons, Inc. All rights reserved
Published by John Wiley & Sons, Inc., Hoboken, New JerseyPublished simultaneously in Canada
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.
Library of Congress Cataloging in Publication data can be found on file at the Library of Congress.
ISBN: 9781119127925
To Nishi and Premal
This book describes the works of some of the more recent founders of the subject of mathematical statistics. By more recent, I have taken it to mean from the Laplacean to the immediate post-Fisherian period. The choice is, of course, entirely subjective, but there are certain strong reasons for starting a work on the history of modern mathematical statistics from Laplace. With the latter, a definite form was given to the discipline. Although Laplace’s levels of rigor were certainly not up to the standards expected today, he brought more sophistication to statistics compared to his predecessors. His systematic work on probability, error theory, and large sample methods laid the foundations for later investigations. One of his most important results, the Central Limit Theorem, was the prototype that was worked and improved upon by generations of succeeding statisticians.
This book may be viewed as a continuation of my previous book Classic Problems of Probability (Gorroochurn, 2012a). However, apart from the fact that this book deals with mathematical statistics rather than probability, I have now endeavored to go into a more detailed and complete coverage of the topics. That does not mean that I have treated every single historical topic. Rather, what I have treated, I have tried to do so in a thorough way. Thus, the reader who wishes to know exactly how Laplace first proved the Central Limit Theorem, how Fisher developed ANOVA, or how Robbins first developed the empirical Bayes method may find the book helpful. In my demonstration of the mathematical results, I have purposely used (as much as possible) the same notation as the original writers so that readers can have a much better feel for the original works. I have also included page numbers referring to the original derivations so readers can easily check against the original works. I really hope readers will be encouraged to read the original papers as these are often treasure troves of statistics in the making.
I also hope that readers will find the book a useful addition to the few books on the history of statistics that have hitherto been written. Two of the major books I ought to mention are Stigler’s The History of Statistics: The Measurement of Uncertainty Before 1900 (Stigler, 1986b) and Hald’s A History of Mathematical Statistics from 1750 to 19301 The first is a superlative and engaging essay on the history of statistics, but since it does not treat the post-1900 period, none of Fisher’s work is treated. The second book is a comprehensive treatment of the pre-1930 period, written in modern mathematical notation. However, it devotes so much space to Laplace and Gauss that, to my taste, the coverage of Fisher is not as complete as those of the two other mathematicians. Moreover, Hald’s book has no coverage of the Neyman–Pearson theory. This, of course, is no fault of Hald since the book is perfectly suited for the aims the author had in writing it.
But why do I insist on the coverage of Fisher’s statistical work? My simple answer is as follows: virtually the whole of modern mathematical statistics sprang either from Fisher’s original work or from the extensions (and sometimes corrections) others made to his original work. Much of the modern statistical vocabulary, especially in the important field of estimation, originated from Fisher. The word “statistic” itself, in its technical sense, is Fisher’s creation. Almost single-handedly, Fisher introduced sufficiency, efficiency, consistency, likelihood, ANOVA, ANCOVA, and countless other statistical concepts and techniques. Many of the current branches of statistics, such as multivariate analysis and nonparametric statistics, had an important original contribution made by Fisher. Unfortunately, it was not possible for me to treat all the different branches of statistics: my main concern here is what may be termed parametric statistical inference.
While going through the book, readers will find that I often quote the original authors. My purpose in doing so is for readers not to be influenced by my own paraphrasing of the original author’s thoughts and intentions, but to hear it from the “horse’s mouth” itself. Thus, I believe there is a different effect on readers when I say “Fisher was somewhat indifferent to unbiasedness as a desideratum for an estimator because unbiased estimators are not invariant to transformations” than when Fisher himself said that “…lack of bias, which since it is not invariant for functional transformation of parameters has never had the least interest for me” (Bennett, 1990, p. 196).2For the most part, though, my quotation of the original author has been accompanied by my own interpretation and comments, so that readers are not left in the dark as to what the original author really meant. The reader will find a similar approach in the late Lehmann’s book Fisher, Neyman, and the Creation of Classical Statistics (Lehmann, 2011). I hope this strategy will bring a sense of authenticity to the subject of history I am writing about.
This book may be divided into three parts. The first part deals mainly with Laplace’s contributions to mathematical statistics. In this part, readers will find a detailed analysis of Laplace’s papers and books. Readers will also learn about Laplace’s definition of probability, his philosophy of universal determinism and how it shaped his statistical research, his early preference for methods of inverse probability (based on the principle of indifference), his investigation of various laws of error, his powerful method of asymptotic approximation, his introduction of characteristic functions, his later use of methods based on direct probability, his proofs of the Central Limit Theorem, and his development of least squares. The first part also contains important related work of other scholars such as Fourier, Lagrange, Adrain, Poisson, Gauss, Hagen, and Lyapunov.
The second part of this book deals mainly with Galton’s discovery of regression and correlation, Pearson’s invention of the X 2 goodness-of-fit statistic, and Student’s innovation of the small-sample t-statistic, culminating in Fisher’s creation of new statistical methods. In the section devoted to Galton, readers will learn about Galton’s observation of an interesting phenomenon, which he thought was unidirectional and which he first termed reversion (which is later changed to regression), and the first appearance of correlation. Extensions of Galton’s work by Weldon, Edgeworth, Yule, Pearson, and Fisher are also considered, as are related works prior to Galton including those of Lagrange, Adrain, Gauss, Laplace, Plana, Bravais, and Bertrand. In the section devoted to Pearson, readers will learn how Pearson developed the X2 goodness-of-fit statistic and clashed with Fisher when the latter pointed out an error in Pearson’s development. Derivations of the χ2-distribution prior to Pearson is also considered, including those of Bienaymé, Abbe, and Helmert. In the section devoted to Student, readers will learn about his original, but somewhat faulty, derivation of the t-statistic (which was called z and slightly different from today’s t) and Fisher’s later consolidation of Student’s result. In this section, previous derivations of the t-distribution, based on inverse probability, by Lüroth and Edgeworth are also examined. Finally, in the section devoted to Fisher, readers will find a detailed account of Fisher’s development of estimation theory, significance testing, ANOVA and related techniques, and fiducial probability. Also included in this section are Fisher’s (in)famous disputes with Jeffreys and Neyman–Pearson, and the use of maximum likelihood and significance testing prior to Fisher.
The third and final part of this book deals with post-Fisherian developments. In this part, readers will learn about extensions to Fisher’s theory of estimation (by Darmois, Koopman, Pitman, Aitken, Fréchet, Cramér, Rao, Blackwell, Lehmann, Scheffé, and Basu), Wald’s powerful statistical decision theory, and the Bayesian revival ushered by Ramsey, de Finetti, Savage, and Robbins in the first half of the twentieth century.
A few of the sections in the book are non-historical in nature and have been denoted by an asterisk (*). Moreover, a handful of historical topics that are found in the “Further Extensions” sections have been given without detailed demonstration.
As in my previous book, I have strived for simplicity of exposition. I hope my efforts will inculcate love for statistics and its history in the reader.
Prakash [email protected] 22, 2015
1
Other books on the history of statistics include Hald (1990), Pearson (1978), Pearson and Kendall (1970), Kendall and Plackett (1977), Hald (2007), MacKenzie (1978), Benzecri (1982), Cowles (2001), Chatterjee (2003), Porter (1986), and Westergaard (1932).
2
See also p. 376 of the current book.
Warren Ewens and Bruce Levin each read large portions of the original manuscript. Both encouraged me throughout the writing of the book. Various sections of earlier drafts were read by Bernard Bru, Bin Cheng, Daniel Courgeau, Andrew Dale, Sukumar Desai, Stephen Fienberg, Hans Fischer, Dominique Fourdrinier, David Hitchcock, Vesa Kuusela, Eugenio Regazzini, Nancy Reid, Christian Robert, Thomas Severini, Glenn Shafer, Craig Smorynski, Aris Spanos, Veronica Vieland, Alan Welsh, and Sandy Zabell. Jessica Overbey, Bei Wang, and Julia Wrobel helped verify the references. Finally, Ji Liao, Lynn Petukhova, Jiashi Wang, Gary Yu, Wenbin Zhu, and many students from my probability class assisted with the proofreading. To all these people I express my deepest gratitude. All remaining errors and omissions are, of course, my sole responsibility.
Susanne Steitz-Filler, Sari Friedman, and Allison McGinniss from Wiley were all very helpful in realizing this project.
My final and continued gratitude goes to my mother and late father.
The word “statistics” is derived from the modern Latin statisticus (“state affairs”). Statisticus itself originates from the classic Latin status, from which word “state” is derived. In the eighteenth century,1 the German political scientist Gottfried Achenwall (1719–1779) brought statistisch into general usage as the collection and evaluation of data relating to the functioning of the state. The English word “statistics” is thus derived from the German statistisch.
Following Achenwall, the next landmark was the creation of the science of political arithmetic in England, in the eighteenth century. Political arithmetic was a set of techniques of classification and calculation on data obtained from birth and death records, trade records, taxes, credit, and so on. It was initiated in England by John Graunt (1620–1674) and then further developed by William Petty (1623–1687). In the nineteenth century, political arithmetic developed into the field of statistics, now dealing with the analysis of all kinds of data. Statistics gradually became an increasingly sophisticated discipline, mainly because of the powerful mathematical techniques of analysis that were infused into it.
The recognition that the data available to the statistician were often the result of chance mechanisms also meant that some notion of probability was essential both for the statistical analysis of data and the subsequent interpretation of the results. The calculus of probability had its origins well before the eighteenth century. In the sixteenth century, the physician and mathematician Gerolamo Cardano (1501–1575) made some forays into chance calculations, many of which were erroneous. His 15-page book entitled Liber de ludo aleae (Cardano, 1663) was written in the 1520s but published only in 1663. However, the official start of the calculus of probability took place in 1654 through the correspondence between Blaise Pascal (1623–1662) and Pierre de Fermat (1601–1665) concerning various games of chances, most notably the problem of points. Meanwhile, having heard of the exchange between the two Frenchmen, the Dutch mathematician Christiaan Huygens (1629–1695) wrote a small manual on probability, De Ratiociniis in ludo aleae (Huygens, 1657), which came out in 1657 as the first published book on probability. Thereupon, a brief period of inactivity in probability followed until Pierre Rémond de Montmort (1678–1719) published his book Essay d’Analyse sur les Jeux de Hazard in 1708 (Montmort, 1708). But the real breakthrough was to come through James Bernoulli’s (1654–1705) posthumous Ars Conjectandi (Bernoulli, 1713), where Bernoulli enunciated and rigorously proved the law of large numbers. This law took probability from mere games of chance and extended its applications to all kinds of world phenomena, such as births, deaths, accidents, and so on. The law of large numbers showed that, viewed microscopically (over short time intervals), measurable phenomena exhibited the utmost irregularity, but when viewed macroscopically (over an extended period of time), they all exhibited a deep underlying structure and constancy. It is no exaggeration then to say that Bernoulli’s Ars Conjectandi revolutionized the world of probability by showing that chance phenomena were indeed amenable to some form of rigorous treatment. The law of large numbers was to receive a further boost in 1730 through its refinement in the hands of Abraham de Moivre (1667–1754), resulting in the first derivation of the normal distribution.
In the meantime, two years before the release of the Ars Conjectandi, the Englishman John Arbuthnot (1667–1735) explicitly applied the calculus of probability to the problem of sex ratio in births and argued for divine providence. This was the first published test of a statistical hypothesis. Further works in demography were conducted by the Comte de Buffon (1707–1788), Daniel Bernoulli (1700–1782), and Jean le Rond d’Alembert (1717–1783).
Although Ars Conjectandi was duly recognized for its revolutionary value, James Bernoulli was not able to bring the book to its full completion before he passed away. One aspect of the problem not treated by Bernoulli was the issue of the probability of hypotheses (or causes), also known as inverse probability. This remained a thorny problem until it was addressed by Thomas Bayes (1701–1761) in the famous “An Essay towards solving a problem in the Doctrine of Chances” (Bayes, 1764). In the essay, again published posthumously, Bayes attacked the inverse problem addressed by Bernoulli. In the latter’s framework, the probability of an event was a known quantity; in the former’s scheme, the probability of an event was an unknown quantity and probabilistic statements were made on it through what is now known as Bayes’ theorem. The importance of this theorem cannot be overstated. But inasmuch as it was recognized for its revolutionary value, it also arose controversy because of a particular assumption made in its implementation (concerning the prior distribution to be used).
In addition to the aforementioned works on probability, another major area of investigation for the statistician was the investigation of errors made in observations and the particular laws such errors were subject to. One of the first such studies was the one performed by the English mathematician Thomas Simpson (1710–1761), who assumed a triangular error distribution in some of his investigations. Other mathematicians involved in this field were Daniel Bernoulli, Joseph-Louis Lagrange (1736–1813), Carl Friedrich Gauss (1777–1855), and especially Adrien-Marie Legendre (1752–1833), who was the first to publish the method of least squares.
1
The early history of statistics is described in detail in the books by Pearson (1978) and Westergaard (1932).
Laplace was to France what Newton had been to England. Pierre-Simon de Laplace1 (Fig. 1.1) was born in Beaumont-en-Auge, Normandy, on March 23, 1749. He belonged to a bourgeois family. Laplace at first enrolled as a theology student at the University of Caen and seemed destined for the church. At the age of 16, he entered the College of Arts at the University of Caen for two years of philosophy before his degree in Theology. There, he discovered not only the mathematical writings of such greats as Euler and Daniel Bernoulli, but also his own aptitude for mathematical analysis. He moved to Paris in 1769 and, through the patronage of d'Alembert, became Professor in Mathematics at the École Royale Militaire in 1771.
Figure 1.1 Pierre-Simon de Laplace (1749–1827).
Wikimedia Commons (Public Domain), http://commons.wikimedia.org/wiki/File:Pierre-Simon_Laplace.jpg
Laplace lived through tumultuous political times: the French revolution took place in 1789, Robespierre came in power in a coup in 1793, Louis XVI was executed in the same year followed by that of Robespierre himself the next year, and Napoléon Bonaparte came to power in 1799 but fell in 1815 when the monarchy was restored by Louis XVIII. Laplace was made Minister of the Interior when Napoléon came in power but was then dismissed only after 6 weeks for attempting to “carry the spirit of the infinitesimal into administration.”2
But Napoléon continued to retain the services of Laplace in other capacities and bestowed several honors on him (senator and vice president of the senate in 1803, Count of the Empire in 1806, Order of the Reunion in 1813). Nevertheless, Laplace voted Napoléon out in 1814, was elected to the French Academy in 1816 under Louis XVIII, and then made Marquis in 1817.
Thus, throughout the turbulent political periods, Laplace was able to adapt and even prosper unlike many of his contemporaries such as the Marquis de Condorcet and Antoine Lavoisier, who both died. Laplace continued to publish seminal papers over several years, culminating in the two major books, Mécanique Céleste (Laplace, 1799) and Théorie Analytique des Probabilités (Laplace, 1812). These were highly sophisticated works and were accompanied by the easier books, Exposition du Système du Monde (Laplace, 1796) and Essai Philosophique sur les Probabilités (Laplace, 1814) aimed at a much wider audience.
From the very start, Laplace's research branched into two main directions: applied probability and mathematical astronomy. However, underlying each branch, Laplace espoused one unifying philosophy, namely, universal determinism. This philosophy was vindicated to a great extent in so far as celestial mechanics was concerned. By using Newton's law of universal gravitation, Laplace was able to mathematically resolve the remaining anomalies in the theory of the Solar System. In particular, he triumphantly settled the issue of the “great inequality of Jupiter and Saturn.” In the Exposition du Système du Monde, we can read:
We shall see that this great law [of universal gravitation]…represents all celestial phenomena even in their minutest details, that there is not one single inequality of their motions which is not derived from it, with the most admirable precisions, and that it explains the cause of several singular motions, just perceived by astronomers, and which were too slow for them to recognize their law.
(Laplace, 1796, Vol. 2, pp. 2–3)
Laplace appealed to a “vast intelligence,” dubbed Laplace's demon, to explain his philosophy of universal determinism3:
All events, even those that on account of their smallness seem not to obey the great laws of nature, are as necessary a consequence of these laws as the revolutions of the sun. An intelligence which at a given instant would know all the forces that move matter, as well as the position and speed of each of its molecules; if on the other hand it was so vast as to analyse these data, it would contain in the same formula, the motion of the largest celestial bodies and that of the lightest atom. For such an intelligence, nothing would be irregular, and the curve described by a simple air or vapor molecule, would seem regulated as certainly as the orbit of the sun is for us.
(Laplace, 1812, p. 177)
However, we are told by Laplace, ignorance of the underlying laws makes us ascribe events to chance:
…But owing to our ignorance regarding the immensity of the data necessary to solve this great problem, and owing to the impossibility, given our weakness, to subject to calculation those data which are known to us, even though their numbers are quite limited; we attribute phenomena which seem to occur and succeed each other without any order, to variable or hidden causes, who action has been designated by the word hazard, a word that is really only the expression of our ignorance.
(ibidem)
Probability is then a relative measure of our ignorance:
Probability is relative, in part to this ignorance, in part to our knowledge.
(ibidem)
It is perhaps no accident that Laplace's research into probability started in the early 1770s, for it was in this period that interest in probability was renewed among many mathematicians due to work in political arithmetic and astronomy (Bru, 2001b, p. 8379). Laplace's work in probability was truly revolutionary because his command of the powerful techniques of analysis enabled him to break new ground in virtually every aspect of the subject. The advances Laplace made in probability and the extent to which he applied them were truly unprecedented. While he was still alive, Laplace thus reached the forefront of the probability scene and commanded immense respect. Laplace passed away in Paris on March 5, 1827, exactly 100 years after Newton's death.
Throughout his academic career, Laplace seldom got entangled in disputes with his contemporaries. One notable exception was his public dissent with Roger Boscovich (1711–1787) over the calculation of the path of a comet given three close observations. More details can be found in Gillispie (2000, Chapter 13) and Hahn (2005, pp. 67–68).
Laplace has often been accused of incorporating the works of others into his own without giving due credit. The situation was aptly described by Auguste de Morgan4 hundreds of years ago. The following extract is worth reading if only for its rhetorical value:
The French school of writers on mathematical subjects has for a long time been wedded to the reprehensible habit of omitting all notice of their predecessors, and Laplace is the most striking instance of this practice, which he carried to the utmost extent. In that part of the “Mecanique Celeste” in which he revels in the results of Lagrange, there is no mention of the name of the latter. The reader who has studied the works of preceding writers will find him, in the “Théorie des Probabilités,” anticipated by De Moivre, James Bernoulli, &c, on certain points. But there is not a hint that any one had previously given those results from which perhaps his sagacity led him to his own more general method. The reader of the “Mecanique Celeste” will find that, for any thing he can see to the contrary, Euler, Clairaut, D'Alembert, and above all Lagrange, need never have existed. The reader of the “Systême du Monde” finds Laplace referring to himself in almost every page, while now and then, perhaps not twenty times in all, his predecessors in theory are mentioned with a scanty reference to what they have done; while the names of observers, between whom and himself there could be no rivalry, occur in many places. To such an absurd pitch is this suppression carried, that even Taylor's name is not mentioned in connexion with his celebrated theorem; but Laplace gravely informs his readers, “Nous donnerons quelques théorêmes généraux qui nous seront utiles dans la suite,” those general theorems being known all over Europe by the names of Maclaurin, Taylor, and Lagrange. And even in his Theory of Probabilities Lagrange's theorem is only “la formule (p) du numéro 21 du second livre de la Mécanique Céleste.” It is true that at the end of the Mecanique Celéste he gives historical accounts, in a condensed form, of the discoveries of others; but these accounts never in any one instance answer the question—Which pages of the preceding part of the work contain the original matter of Laplace, and in which is he only following the track of his predecessor?
(De Morgan, 1839, Vol. XXX, p. 326)
Against such charges, recent writers like Stigler (1978) and Zabell (1988) have come to Laplace's defense on the grounds that the latter's citation rate was no worse than those of his contemporaries. That might be the case, but the two studies also show that the citation rates of Laplace as well as his contemporaries were all very low. This is hardly a practice that can be condoned, especially when we know these mathematicians jealously guarded their own discoveries. Newton and Leibniz clashed fiercely over priority on the Calculus, as did Gauss and Legendre on least squares, though to a lesser extent. If mathematicians were so concerned that their priority over discoveries be acknowledged, then surely it was incumbent upon them to acknowledge the priority of others on work that was not their own.
This memoir (Laplace, 1774b) is among the first of Laplace's published works and also his first paper on probability (Fig. 1.2). Here, for the first time, Laplace enunciated the definition of probability, which he called a Principe (Principle):
The probability of an event is equal to the product of each favorable case by its probability divided by the product if each possible case by its probability, and if each case is equally likely, the probability of the event is equal to the number of favorable cases divided by the number of all possible cases.5
(Laplace, 1774b, OC 8, pp. 10–11)
The above is the classical (or mathematical) definition of probability that is still used today, although several other mathematicians provided similar definitions earlier. For example6:
Gerolamo Cardano's definition in Chapter 14 of the
Liber de ludo aleae
:
So there is one general rule, namely, that we should consider the whole circuit, and the number of those casts which represents in how many ways the favorable result can occur, and compare that number to the rest of the circuit, and according to that proportion should the mutual wagers be laid so that one may contend on equal terms.
(Cardano, 1663)
Gottfried Wilhelm Leibniz's definition in the
Théodicée
:
If a situation can lead to different advantageous results ruling out each other, the estimation of the expectation will be the sum of the possible advantages for the set of all these results, divided into the total number of results.
(Leibniz, 1710, 1969 edition, p. 161)
James (Jacob) Bernoulli's statement from the
Ars Conjectandi
:
… if complete and absolute certainty, which we represent by the letter a or by 1, is supposed, for the sake of argument, to be composed of five parts or probabilities, of which three argue for the existence or future existence of some outcome and the others argue against it, then that outcome will be said to have 3a/5 or 3/5 of certainty.
(Bernoulli, 1713, English edition, pp. 315–316)
Abraham de Moivre's definition from the
De Mensura Sortis
:
If p is the number of chances by which a certain event may happen, & q is the number of chances by which it may fail; the happenings as much as the failings have their degree of probability: But if all the chances by which the event may happen or fail were equally easy; the probability of happening to the probability of failing will be p to q.
(de Moivre, 1733, p. 215)
Figure 1.2 First page of Laplace's “Mémoire sur les suites récurro-récurrentes” (Laplace, 1774b)
Although Laplace's Principe was an objective definition, Laplace gave it a subjective overtone by later redefining mathematical probability as follows:
The probability of an event is thus just the ratio of the number of cases favorable to it, to the number of possible cases, when there is nothing to make us believe that one case should occur rather than any other.7
(Laplace, 1776b, OC 8, p. 146)
In the above, Laplace appealed to the principle of indifference8 and his definition of probability relates to our beliefs. It is thus a subjective interpretation of the classical definition of probability.
The “Mémoire sur la probabilité des causes par les événements” (Laplace, 1774a) (Fig. 1.3) is a landmark paper of Laplace because it introduced most of the fundamental principles that he first used and would stick to for the rest of his career.9 Bayes’ theorem was stated and inverse probability was used as a general method for dealing with all kinds of problems. The asymptotic method was introduced as a powerful tool for approximating certain types of integrals, and an inverse version of the Central Limit Theorem was also presented. Finally the double exponential distribution was introduced as a general law of error. Laplace here presented many of the problems that he would later come back to again, each time refining and perfecting his previous solutions.
Figure 1.3 First page of Laplace's “Mémoire sur la probabilité des causes par les événements” (Laplace, 1774a)
In Article II of the memoir, Laplace distinguished between two classes of probability problems:
The uncertainty of human knowledge bears on events or the causes of events; if one is certain, for example, that a ballot contains only white and black tickets in a given ratio, and one asks the probability that a randomly chosen ticket will be white, the event is then uncertain, but the cause on which depends the existence of the probability, that is the ratio of white to black tickets, is known.
In the following problem: A ballot is assumed to contain a given number of white and black tickets in an unknown ratio, if one draws a white ticket, determine the probability that the ratio of white to black tickets in the ballot is p:q; the event is known and the cause unknown.
One can reduce to these two classes of problems all those that depend on the doctrine of chances.
(Laplace, 1774a, OC 8, p. 29)
In the above, Laplace distinguished between problems that require the calculation of direct probabilities and those that require the calculation of inverse probabilities. The latter depended on the powerful theorem first adduced by Bayes and which Laplace immediately enunciated as a Principe as follows:
PRINCIPE—If an event can be produced by a number n of different causes, the probabilities of the existence of these causes calculated from the event are to each other as the probabilities of the event calculated from the causes, and the probability of the existence of each cause is equal to the probability of the event calculated from that cause, divided by the sum of all the probabilities of the event calculated from each of the causes.10
(ibidem)
Laplace's first statement in the above can be written mathematically as follows: if C1, C2, …, Cn are n exhaustive events (“causes”) and E is another event, then
Equation (1.1) implies that
Equation (1.2) is Laplace's second statement in the previous quotation. It is a restricted version of Bayes’ theorem because it assumes a discrete uniform prior, that is, each of the “causes” C1, C2, …, Cn is equally likely: for .
It should be noted that Laplace's enunciation of the theorem in Eq. (1.2) in 1774 made no mention of Bayes’ publication 10 years earlier (Bayes, 1764), and it is very likely that Laplace was unaware of the latter's work. However, the 1778 volume of the Histoire de l'Académie Royale des Sciences, which appeared in 1781, contained a summary by the Marquis de Condorcet (1743–1794) of Laplace's “Mémoire sur les Probabilités,” which also appeared in that volume (Laplace, 1781). Laplace's article made no mention of Bayes or Price,11 but Condorcet's summary explicitly acknowledged the two Englishmen:
These questions [on inverse probability] about which it seems that Messrs. Bernoulli and Moivre had thought, have been since then examined by Messrs. Bayes and Price; but they have limited themselves to exposing the principles that can be used to solve them. M. de Laplace has expanded on them….
(Condorcet, 1781, p. 43)
As for Laplace himself, his acknowledgment of Bayes’ priority on the theorem came much later in the Essai Philosophique Sur les Probabilités:
Bayes, in the Transactions philosophiques of the year 1763, sought directly the probability that the possibilities indicated by past experiences are comprised within given limits; and he has arrived at this in a refined and very ingenious manner, although a little perplexing. This subject is connected with the theory of the probability of causes and future events, concluded from events observed. Some years later I expounded the principles of this theory….
(Laplace, 1814, English edition, p. 189)
It also in the Essai Philosophique that Laplace first gave the general (discrete) version of Bayes’ theorem:
The probability of the existence of anyone of these causes is then a fraction whose numerator is the probability of the event resulting from this cause and whose denominator is the sum of the similar probabilities relative to all the causes; if these various causes, considered a priori, are unequally probable it is necessary, in place of the probability of the event resulting from each cause, to employ the product of this probability by the possibility of the cause itself.
(ibid., pp. 15–16)
Equation (1.2) can thus be written in general form as
which is the form in which Bayes’ theorem is used today. The continuous version of Eq. (1.3) may be written as
where θ is a parameter, f(θ) is the prior density of θ, is joint density12 of the observations x, and is the posterior density of θ. It is interesting that neither of the above two forms (and not even those assuming a uniform prior) by which we recognize Bayes’ theorem today can be found explicitly in Bayes’ paper.
Laplace almost always used (1.4) in the form