Classic Topics on the History of Modern Mathematical Statistics - Prakash Gorroochurn - E-Book

Classic Topics on the History of Modern Mathematical Statistics E-Book

Prakash Gorroochurn

0,0
120,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

"There is nothing like it on the market...no others are as encyclopedic...the writing is exemplary: simple, direct, and competent." --George W. Cobb, Professor Emeritus of Mathematics and Statistics, Mount Holyoke College Written in a direct and clear manner, Classic Topics on the History of Modern Mathematical Statistics: From Laplace to More Recent Times presents a comprehensive guide to the history of mathematical statistics and details the major results and crucial developments over a 200-year period. Presented in chronological order, the book features an account of the classical and modern works that are essential to understanding the applications of mathematical statistics. Divided into three parts, the book begins with extensive coverage of the probabilistic works of Laplace, who laid much of the foundations of later developments in statistical theory. Subsequently, the second part introduces 20th century statistical developments including work from Karl Pearson, Student, Fisher, and Neyman. Lastly, the author addresses post-Fisherian developments. Classic Topics on the History of Modern Mathematical Statistics: From Laplace to More Recent Times also features: * A detailed account of Galton's discovery of regression and correlation as well as the subsequent development of Karl Pearson's X² and Student's t * A comprehensive treatment of the permeating influence of Fisher in all aspects of modern statistics beginning with his work in 1912 * Significant coverage of Neyman-Pearson theory, which includes a discussion of the differences to Fisher's works * Discussions on key historical developments as well as the various disagreements, contrasting information, and alternative theories in the history of modern mathematical statistics in an effort to provide a thorough historical treatment Classic Topics on the History of Modern Mathematical Statistics: From Laplace to More Recent Times is an excellent reference for academicians with a mathematical background who are teaching or studying the history or philosophical controversies of mathematics and statistics. The book is also a useful guide for readers with a general interest in statistical inference.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 1213

Veröffentlichungsjahr: 2016

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

COVER

TITLE PAGE

PREFACE

ACKNOWLEDGMENTS

INTRODUCTION: LANDMARKS IN PRE-LAPLACEAN STATISTICS

PART ONE: LAPLACE

1 THE LAPLACEAN REVOLUTION

1.1 PIERRE-SIMON DE LAPLACE (1749–1827)

1.2 LAPLACE'S WORK IN PROBABILITY AND STATISTICS

1.3 THE PRINCIPLE OF INDIFFERENCE

1.4 FOURIER TRANSFORMS, CHARACTERISTIC FUNCTIONS, AND CENTRAL LIMIT THEOREMS

1.5 LEAST SQUARES AND THE NORMAL DISTRIBUTION

PART TWO: FROM GALTON TO FISHER

2 GALTON, REGRESSION, AND CORRELATION

2.1 FRANCIS GALTON (1822–1911)

2.2 GENESIS OF REGRESSION AND CORRELATION

2.3 FURTHER DEVELOPMENTS AFTER GALTON

2.4 WORK ON CORRELATION AND THE BIVARIATE (AND MULTIVARIATE) NORMAL DISTRIBUTION BEFORE GALTON

3 KARL PEARSON’S CHI-SQUARED GOODNESS-OF-FIT TEST

3.1 KARL PEARSON (1857–1936)

3.2 ORIGIN OF PEARSON'S CHI-SQUARED

3.3 PEARSON'S ERROR AND CLASH WITH FISHER

3.4 THE CHI-SQUARED DISTRIBUTION BEFORE PEARSON

4 STUDENT’S

t

4.1 WILLIAM SEALY GOSSET (1876–1937)

4.2 ORIGIN OF STUDENT’S TEST: THE 1908 PAPER

4.3 FURTHER DEVELOPMENTS

4.4 STUDENT ANTICIPATED

5 THE FISHERIAN LEGACY

5.1 RONALD AYLMER FISHER (1890–1962)

5.2 FISHER AND THE FOUNDATION OF ESTIMATION THEORY

5.3 FISHER AND SIGNIFICANCE TESTING

5.4 ANOVA AND THE DESIGN OF EXPERIMENTS

5.5 FISHER AND PROBABILITY

5.6 FISHER VERSUS NEYMAN–PEARSON: CLASH OF THE TITANS

5.7 MAXIMUM LIKELIHOOD BEFORE FISHER

5.8 SIGNIFICANCE TESTING BEFORE FISHER

PART THREE: FROM DARMOIS TO ROBBINS

6 BEYOND FISHER AND NEYMAN–PEARSON

6.1 EXTENSIONS TO THE THEORY OF ESTIMATION

6.2 ESTIMATION AND HYPOTHESIS TESTING UNDER A SINGLE FRAMEWORK: WALD'S STATISTICAL DECISION THEORY (1950)

6.3 THE BAYESIAN REVIVAL

REFERENCES

INDEX

END USER LICENSE AGREEMENT

List of Tables

Chapter 02

Table 2.1 Table of Attributes for Yule’s Coefficient of Association

Chapter 03

Table 3.1 Roulette and Coin-Tossing Results

Table 3.2 Weldon's Data On Frequencies of Dice Showing 5 or 6 Points When 12 Dice Are Cast 26,306 Times

Table 3.3 2 × 2 Table in Greenwood and Yule's Paper

Chapter 04

Table 4.1 Hours of Sleep Gained by the Use of Hyoscyamine Hydrobromide

Chapter 05

Table 5.1 The Pearson System of Curves

Table 5.2 Table Used by Fisher to Illustrate the Concept of Ancillarity

Table 5.3 Cross-classification Between two Categorical Variables

Table 5.4 Basu's Example of Nonunique Ancillary Statistics

Table 5.5 Maximum Likelihood Estimator

T

of

θ

Table 5.6 The Six Ancillary Statistics

Y

1

, …, 

Y

6

in Basu's Example

Table 5.7 Correct ANOVA Table for Fisher and Mackenzie's Split–Split Plot Design

Table 5.8 Incorrect ANOVA Table from Fisher and Mackenzie's “Studies in crop variation. II. The manurial response of different potato varieties”

Table 5.9 Data for First ANOVA Example in Fisher's

Statistical Methods for Research Workers

Table 5.10 Table for First ANOVA Example in Fisher's

Statistical Methods for Research Workers

Table 5.11 First ANOVA Table from

Statistical Methods for Research Workers

for the Potato Example

Table 5.12 Second ANOVA Table from

Statistical Methods for Research Workers

for the Potato Example

Table 5.13 First ANOVA Table for Regression Example in Fisher's

Statistical Methods for Research Workers

Table 5.14 Second ANOVA Table for Regression Example in Fisher's

Statistical Methods for Research Workers

Table 5.15 Anova Table for Multiple Linear Regression

Table 5.16 Data for Completely Randomized Design

Table 5.17 One-way ANOVA for Completely Randomized Design

Table 5.18 Data for Randomized Block Experiment

Table 5.19 Two-way ANOVA for Randomized Block Design

Table 5.20 Data for Latin Square Design

Table 5.21 ANOVA Table for Latin Square

Table 5.22 Preliminary Yields of Tea Plots

Table 5.23 Experimental Yields of Tea Plots

Table 5.24 Analysis of Experimental Yields

Table 5.25 Analysis of Preliminary Yields

Table 5.26 Sum of Squares and Products (

x

and

y

are Deviations from Their Respective Row Means for “Rows” and Deviations from Their Respective Column Means for “Columns”)

Table 5.27 ANCOVA Table

Table 5.28 Student's ANOVA Table for the ABBA Design

Table 5.29 ABBA Design with 16 Strips and 12 Sections per Strip used by Barbacki and Fisher (1936, p. 190)

Table 5.30 Differences (A − B) for the Design in Table 5.29

Table 5.31 Differences (A − B − B + A) for one Particular Randomization of Half-Drill Strips

Table 5.32 ANOVA Table for ABBA Design

Table 5.33 Part of Table Used by Fisher to Illustrate Fiducial Intervals for

ρ

Table 5.34 2 × 2 Table for Barnard's Example

Table 5.35 Two Uninformative Tables that are Used in Barnard's Unconditional Test

Table 5.36 Christenings in London for 82 Consecutive Years

Chapter 06

Table 6.1 Correspondence between Two-Person Game and Decision Problem

Table 6.2 Utilities (

U

1

… U

4

) of Outcomes for Two Possible Actions (

α

,

β

) Undertaken under Two Possible Conditions (

p

, not (~)

p

)

a

Table 6.3 Utilities for a Particular Ethically Neutral Proposition

p

Table 6.4 Utility Implication of Ramsey's Definition of Value Difference

Table 6.5 Calibrating a Subject's Utility Scale (First Estimate)

Table 6.6 Calibrating a Subject's Utility Scale (Second Estimate)

Table 6.7 Ramsey's Definition of the Subjective Probability

P

of a Proposition

p

(Which is not Necessarily Ethically Neutral)

Table 6.8 Ramsey's Definition of Conditional Probability

Table 6.9 Consequences in Egg Example Given by Savage (1954, p. 14)

List of Illustrations

Chapter 01

Figure 1.1 Pierre-Simon de Laplace (1749–1827).

Figure 1.2 First page of Laplace's “Mémoire sur les suites récurro-récurrentes” (Laplace, 1774b)

Figure 1.3 First page of Laplace's “Mémoire sur la probabilité des causes par les événements” (Laplace, 1774a)

Figure 1.4 Laplace's determination of an appropriate center for three observations

a

,

b

, and

c

Figure 1.5 Laplace's curve of probability for

n

 = 2 comets

Figure 1.6 Laplace's curve of probability for

n

 = 3 comets

Figure 1.7 Laplace's probability curve for several observations made on

V

when the law of error (

BLMN

) is assumed to be known

Figure 1.8 Laplace's probability curve for several observations when the law of error (

MRYN

) is assumed to be unknown

Figure 1.9 The line

AH

divides the area under

MHN

into two equal halves

Figure 1.10 First page of “Mémoire sur les probabilités” (Laplace, 1781)

Figure 1.11 Marquis de Condorcet (1743–1794).

Figure 1.12 Title page of de Moivre's Miscellanea Analytica (de Moivre, 1730)

Figure 1.13 First page of Lagrange's “Sur une nouvelle espèce de calcul relatif à la différentiation et à l'intégration des quantités variables” (Lagrange, 1774)

Figure 1.14 First page of “Mémoire sur les approximations des formules qui sont fonctions de très grands nombres et sur leur application aux probabilités” (Laplace, 1810a)

Figure 1.15 First page of Bayes’ posthumous paper, “An Essay towards solving a problem in the doctrine of chances” (Bayes, 1764)

Figure 1.16 David Hume (1711–1776).

Figure 1.17 Three of the ways in which a chord can be randomly chosen on a circle

Figure 1.18 Henri Poincaré (1854–1912).

Figure 1.19 Edwin T. Jaynes (1922–1998).

Figure 1.20 Translational invariance in Bertrand's Paradox

Figure 1.21 Jean Baptiste Joseph Fourier (1768–1830).

Figure 1.22 First page of summary by Poisson of Fourier's “Mémoire sur la propagation de la chaleur dans les corps solides” (Fourier, 1808)

Figure 1.23 Joseph-Louis Lagrange (1736–1813).

Figure 1.24 First page of Lagrange's “Mémoire sur l'utilité de la méthode de prendre le milieu entre les résultats de plusieurs observations” (Lagrange, 1776)

Figure 1.25 Siméon Denis Poisson (1781–1840).

Figure 1.26 Aleksandr Mikhailovich Lyapunov (1857–1918).

Figure 1.27 Title page of Lyapunov's 1901 article “Nouvelle forme du théorème sur la limite de probabilité” (Lyapunov, 1901a)

Figure 1.28 Legendre's method of least squares, taken from the

Nouvelles Méthodes

(Legendre, 1805)

Figure 1.29 Robert Adrain (1775–1843).

Figure 1.30 First page of Robert Adrain's 1808 paper “Research concerning the probabilities of the errors which happen in making observations.” (Adrain, 1808)

Figure 1.31 Adrain's second derivation of the normal law

Figure 1.32 Carl Friedrich Gauss (1777–1855).

Figure 1.33 Gauss’ February 28, 1839, letter to Bessel (Gauss and Bessel, 1880)

Chapter 02

Figure 2.1 Francis Galton (1822–1911).

Figure 2.2 First page of Galton’s “Typical laws of heredity” (Galton, 1877)

Figure 2.3 The quincunx (left), the double quincunx (middle), and the convergent quincunx (right) . Only the quincunx was actually constructed

Figure 2.4 First page of Galton’s 1885 Presidential Lecture to the Anthropology Section of the British Association (Galton, 1885a)

Figure 2.5 Galton’s table on stature (Galton, 1885b, p. 248)

Figure 2.6 Galton’s plot of offspring median heights (

Y

 ) against the midparental heights (

X

 ) (Galton, 1885b)

Figure 2.7 Galton’s discovery of the bivariate normal distribution (Galton, 1885b)

Figure 2.8 First page of Dickson’s analysis (Galton, 1886, p. 63)

Figure 2.9 First page of Galton’s 1888 paper “Co-relations and their measurements” (Galton, 1888)

Figure 2.10 Walter Frank Raphael Weldon (1860–1906).

Figure 2.11 Francis Ysidro Edgeworth (1845–1926).

Figure 2.12 First page of Edgeworth’s “Correlated averages” (Edgeworth, 1892)

Figure 2.13 First page of Pearson’s “Regression, heredity, and panmixia” (Pearson et al., 1896)

Figure 2.14 George Udny Yule (1871–1951).

Figure 2.15 Title page of first edition of Yule’s

Introduction to the Theory of Statistics

(Yule, 1911)

Figure 2.16 Plot of mean

x

-values (denoted by ×) at given

y

-values for a three-dimensional frequency surface.

Figure 2.17 Plane and table representation for Pearson’s measure of association for attributes (Pearson, 1900a, p. 2)

Figure 2.18 Fisher’s geometric derivation of the exact distribution of the correlation coefficient

Figure 2.19 Projection of

Y

-space onto

X

-space

Figure 2.20 Giovanni Antonio Amedeo Plana (1781–1864).

Figure 2.21 Auguste Bravais (1811–1863).

Figure 2.22 Extract from Bravais’ 1846 memoir (Bravais, 1846, p. 273)

Figure 2.23 Joseph Louis François Bertrand (1822–1900).

Chapter 03

Figure 3.1 Karl Pearson (1857–1936).

Figure 3.2 From Pearson's

The Chances of Death

(Pearson, 1897b, p. 13)

Figure 3.3 First page of Pearson's 1900 paper “On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling” (Pearson, 1900b)

Figure 3.4 Major Greenwood (1880–1949).

Figure 3.5 Irenée-Jules Bienaymé (1796–1878).

Figure 3.6 First page of Bienaymé's article “Sur la probabilité des erreurs d'aprés la méthode des moindres carrés” (Bienaymé, 1852)

Figure 3.7 Ernst Karl Abbe (1840–1905).

Figure 3.8 Title page of Abbe's dissertation (Abbe, 1863)

Figure 3.9 Friedrich Robert Helmert (1843–1917).

Figure 3.10 The transformations (p. 122) as first used by Helmert in his paper “Die Genauigkeit der Formel von Peters zur Berechnung des wahrscheinlichen Beobachtungsfehler direkter Beobachtungen gleicher Genauigkeit” (Helmert, 1876a) to obtain the distribution of [

λ λ

]

Chapter 04

Figure 4.1 William Sealy Gosset (“Student”) (1876–1937).

Figure 4.2 First page of Student’s 1908 paper.

Figure 4.3 Fisher’s geometric derivation of the joint distribution of

and

s

Figure 4.4 Jacob Lüroth (1844–1910).

Figure 4.5 First page of Lüroth’s paper (Lüroth, 1876)

Chapter 05

Figure 5.1 Sir Ronald Aylmer Fisher (1890–1962).

Figure 5.2 Sir Arthur Stanley Eddington (1882–1944).

Figure 5.3 Sir David Roxbee Cox (b. 1924).

Figure 5.4 First page of Fisher and Mackenzie's “Studies in crop variation. II. The manurial response of different potato varieties.”

Figure 5.5 Half-drill strip, Knight's move, and chessboard designs. (a) Half-drill strip (ABBA) design for two treatments (the strips are along the rows and the ABBA “sandwiches” are along the columns), (b) Knight's move with five treatments and five replicates. The name Knight's move comes from the fact that each treatment is repeated by moving one down and two across. This particular design is also known as the Knut Vik Square in honor of the Norwegian Knut Vik who presented it in 1924, and (c) Beaven's chessboard for eight treatments.

Figure 5.6 Maurice Stevenson Bartlett (1910–2002).

Figure 5.7 Sir Harold Jeffreys (1891–1989).

Figure 5.8 Jerzy Neyman (1894–1981).

Figure 5.9 Egon Sharpe Pearson (1885–1980).

Figure 5.10 First page of Neyman and Pearson's 1928 paper

Figure 5.11 Neyman and Pearson's demonstration of the sufficiency of the condition

p

0

 ≤ 

kp

1

Figure 5.12 George Alfred Barnard (1915–2002).

Figure 5.13 Johann Heinrich Lambert (1728–1777).

Figure 5.14 Title page of Lambert's

Photometria

Figure 5.15 Lambert's maximum likelihood procedure

Figure 5.16 Daniel Bernoulli

Figure 5.17 Original Latin version of Daniel Bernoulli's article “Diiudicatio maxime probabilis plurium obseruationum discrepantium atque verisimillima inductio inde formanda”

Figure 5.18 Semi‐circular distribution considered by Daniel Bernoulli in his article “Diiudicatio maxime probabilis plurium obseruationum discrepantium atque verisimillima inductio inde formanda”

Figure 5.19 John Arbuthnot

Figure 5.20 First page of Arbuthnot's 1710 memoir

Figure 5.21 Willem Jacob ‘s Gravesande

Figure 5.22 ‘s Gravesande's article on Arbuthnot's Problem, taken from the

Oeuvres Philosophiques et Mathématiques de Mr G.F. ‘s Gravesande

Figure 5.23 Nicholas Bernoulli's comments on the Arbuthnot's Problem, taken from the

Oeuvres Philosophiques et Mathématiques de Mr G.F. ‘s Gravesande

Figure 5.24 Nicholas Bernoulli's classification of the terms before the (

F

+ 1)th term in the binomial expansion of (

M

+

F

)

n

Figure 5.25 French translated version of Daniel Bernoulli's prize‐winning article

Figure 5.26 Jean le Rond d'Alembert

Figure 5.27 Isaac Todhunter

Figure 5.28 First page of Michell's article “An inquiry into the probable parallax, and magnitude of the fixed stars, from the quantity of light which they afford us, and the particular circumstances of their situation”

Figure 5.29 Sir John Frederick William Herschel (1792–1871).

Figure 5.30 James David Forbes (1809–1868).

Chapter 06

Figure 6.1 Georges Darmois (1888–1960).

Figure 6.2 Edwin James George Pitman (1897–1993).

Figure 6.3 Maurice Fréchet (1878–1973). Wikimedia Commons (Public Domain), http://commons.wikimedia.org/wiki/File:Frechet.jpeg

Figure 6.4 Calyampudi Radhakrishna Rao (1920–). Wikimedia Commons (Licensed under the Creative Commons Attribution-Share Alike 4.0 International, 3.0 Unported, 2.5 Generic, 2.0 Generic and 1.0 Generic license), http://commons.wikimedia.org/wiki/File:Calyampudi_Radhakrishna_Rao_at_ISI_Chennai.JPG

Figure 6.5 Harald Cramér (1893–1985). Wikimedia Commons (Licensed under the Creative Commons Attribution-Share Alike 2.0 Germany license), http://commons.wikimedia.org/wiki/File:Harald_Cram%C3%A9r.jpeg

Figure 6.6 David Harold Blackwell (1919–2010). Wikimedia Commons (Licensed under the Creative Commons Attribution-Share Alike 2.0 Germany license), http://commons.wikimedia.org/wiki/File:David_Blackwell.jpg

Figure 6.7 Erich Leo Lehmann (1917–2009).

Figure 6.8 Henry Scheffé (1907–1977). Wikimedia Commons (Free Art License), http://commons.wikimedia.org/wiki/File:Henry_Scheffe.jpeg

Figure 6.9 Abraham Wald (1902–1950). Wikimedia Commons (Public Domain), http://en.wikipedia.org/wiki/File:Abraham_Wald_in_his_youth.jpg

Figure 6.10 Frank Plumpton Ramsey (1903–1930). Wikipedia, http://en.wikipedia.org/wiki/Frank_P._Ramsey

Figure 6.11 Bruno de Finetti (1906–1985).

Figure 6.12 Leonard Jimmie Savage (1917–1971). Book cover of “

The Writings of Leonard Jimmie Savage – A Memorial Selection” by L.J. Savage (1981)

.

Figure 6.13 (a) Savage's second postulate. (b) Illustration of the second postulate: the acts

f

and

g

are modified to

f′

and

g′

, respectively; since both pairs agree on ~

B

, preference between

f

and

g

(and

f′

and

g′

) should be based on

B

, not on the utility values

x

and

y

Figure 6.14 Herbert Ellis Robbins (1915–2001). Wikipedia, http://en.wikipedia.org/wiki/Herbert_Robbins

Guide

Cover

Table of Contents

Begin Reading

Pages

ii

iii

iv

v

xvi

xvii

xviii

xix

xx

xxi

1

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

373

374

375

376

377

424

425

426

427

428

429

430

431

432

433

434

435

436

437

438

439

440

441

442

443

444

445

446

447

448

449

450

451

452

453

454

455

456

457

458

459

460

461

462

463

464

465

466

467

468

469

470

471

472

473

474

475

476

477

478

479

480

481

482

483

484

485

486

487

488

489

490

491

492

493

494

495

496

497

458

499

500

501

502

503

504

505

506

507

508

509

510

511

512

513

514

515

516

517

518

519

520

521

522

523

524

525

526

527

528

529

530

531

532

533

534

535

536

537

538

539

540

541

542

543

544

545

546

547

548

549

550

551

552

553

554

555

556

557

558

559

560

561

562

563

564

565

566

567

568

569

570

571

572

573

574

575

576

577

578

579

580

581

582

583

584

585

586

587

588

589

590

591

593

594

595

596

597

598

599

600

601

602

603

604

605

606

607

608

609

610

611

612

613

614

615

616

617

618

619

620

621

622

623

624

625

626

627

628

629

630

631

632

633

634

635

636

637

638

639

640

641

642

643

644

645

646

647

648

649

650

651

652

653

654

655

656

657

658

659

660

661

662

663

664

665

666

667

668

669

670

671

672

673

674

675

676

677

678

679

680

681

682

683

684

685

686

687

688

689

690

691

692

693

694

695

696

697

698

699

700

701

702

703

704

705

706

707

708

709

710

711

712

713

714

715

716

717

718

719

720

721

722

723

724

725

726

727

728

729

730

731

732

733

734

735

736

737

738

739

740

741

742

743

744

745

746

747

748

749

750

751

752

753

754

CLASSIC TOPICS ON THE HISTORY OF MODERN MATHEMATICAL STATISTICS

From Laplace to More Recent Times

 

PRAKASH GORROOCHURN

Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY

 

 

 

 

 

 

 

 

 

 

 

Copyright © 2016 by John Wiley & Sons, Inc. All rights reserved

Published by John Wiley & Sons, Inc., Hoboken, New JerseyPublished simultaneously in Canada

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.

Library of Congress Cataloging in Publication data can be found on file at the Library of Congress.

ISBN: 9781119127925

 

 

To Nishi and Premal

PREFACE

This book describes the works of some of the more recent founders of the subject of mathematical statistics. By more recent, I have taken it to mean from the Laplacean to the immediate post-Fisherian period. The choice is, of course, entirely subjective, but there are certain strong reasons for starting a work on the history of modern mathematical statistics from Laplace. With the latter, a definite form was given to the discipline. Although Laplace’s levels of rigor were certainly not up to the standards expected today, he brought more sophistication to statistics compared to his predecessors. His systematic work on probability, error theory, and large sample methods laid the foundations for later investigations. One of his most important results, the Central Limit Theorem, was the prototype that was worked and improved upon by generations of succeeding statisticians.

This book may be viewed as a continuation of my previous book Classic Problems of Probability (Gorroochurn, 2012a). However, apart from the fact that this book deals with mathematical statistics rather than probability, I have now endeavored to go into a more detailed and complete coverage of the topics. That does not mean that I have treated every single historical topic. Rather, what I have treated, I have tried to do so in a thorough way. Thus, the reader who wishes to know exactly how Laplace first proved the Central Limit Theorem, how Fisher developed ANOVA, or how Robbins first developed the empirical Bayes method may find the book helpful. In my demonstration of the mathematical results, I have purposely used (as much as possible) the same notation as the original writers so that readers can have a much better feel for the original works. I have also included page numbers referring to the original derivations so readers can easily check against the original works. I really hope readers will be encouraged to read the original papers as these are often treasure troves of statistics in the making.

I also hope that readers will find the book a useful addition to the few books on the history of statistics that have hitherto been written. Two of the major books I ought to mention are Stigler’s The History of Statistics: The Measurement of Uncertainty Before 1900 (Stigler, 1986b) and Hald’s A History of Mathematical Statistics from 1750 to 19301 The first is a superlative and engaging essay on the history of statistics, but since it does not treat the post-1900 period, none of Fisher’s work is treated. The second book is a comprehensive treatment of the pre-1930 period, written in modern mathematical notation. However, it devotes so much space to Laplace and Gauss that, to my taste, the coverage of Fisher is not as complete as those of the two other mathematicians. Moreover, Hald’s book has no coverage of the Neyman–Pearson theory. This, of course, is no fault of Hald since the book is perfectly suited for the aims the author had in writing it.

But why do I insist on the coverage of Fisher’s statistical work? My simple answer is as follows: virtually the whole of modern mathematical statistics sprang either from Fisher’s original work or from the extensions (and sometimes corrections) others made to his original work. Much of the modern statistical vocabulary, especially in the important field of estimation, originated from Fisher. The word “statistic” itself, in its technical sense, is Fisher’s creation. Almost single-handedly, Fisher introduced sufficiency, efficiency, consistency, likelihood, ANOVA, ANCOVA, and countless other statistical concepts and techniques. Many of the current branches of statistics, such as multivariate analysis and nonparametric statistics, had an important original contribution made by Fisher. Unfortunately, it was not possible for me to treat all the different branches of statistics: my main concern here is what may be termed parametric statistical inference.

While going through the book, readers will find that I often quote the original authors. My purpose in doing so is for readers not to be influenced by my own paraphrasing of the original author’s thoughts and intentions, but to hear it from the “horse’s mouth” itself. Thus, I believe there is a different effect on readers when I say “Fisher was somewhat indifferent to unbiasedness as a desideratum for an estimator because unbiased estimators are not invariant to transformations” than when Fisher himself said that “…lack of bias, which since it is not invariant for functional transformation of parameters has never had the least interest for me” (Bennett, 1990, p. 196).2For the most part, though, my quotation of the original author has been accompanied by my own interpretation and comments, so that readers are not left in the dark as to what the original author really meant. The reader will find a similar approach in the late Lehmann’s book Fisher, Neyman, and the Creation of Classical Statistics (Lehmann, 2011). I hope this strategy will bring a sense of authenticity to the subject of history I am writing about.

This book may be divided into three parts. The first part deals mainly with Laplace’s contributions to mathematical statistics. In this part, readers will find a detailed analysis of Laplace’s papers and books. Readers will also learn about Laplace’s definition of probability, his philosophy of universal determinism and how it shaped his statistical research, his early preference for methods of inverse probability (based on the principle of indifference), his investigation of various laws of error, his powerful method of asymptotic approximation, his introduction of characteristic functions, his later use of methods based on direct probability, his proofs of the Central Limit Theorem, and his development of least squares. The first part also contains important related work of other scholars such as Fourier, Lagrange, Adrain, Poisson, Gauss, Hagen, and Lyapunov.

The second part of this book deals mainly with Galton’s discovery of regression and correlation, Pearson’s invention of the X 2 goodness-of-fit statistic, and Student’s innovation of the small-sample t-statistic, culminating in Fisher’s creation of new statistical methods. In the section devoted to Galton, readers will learn about Galton’s observation of an interesting phenomenon, which he thought was unidirectional and which he first termed reversion (which is later changed to regression), and the first appearance of correlation. Extensions of Galton’s work by Weldon, Edgeworth, Yule, Pearson, and Fisher are also considered, as are related works prior to Galton including those of Lagrange, Adrain, Gauss, Laplace, Plana, Bravais, and Bertrand. In the section devoted to Pearson, readers will learn how Pearson developed the X2 goodness-of-fit statistic and clashed with Fisher when the latter pointed out an error in Pearson’s development. Derivations of the χ2-distribution prior to Pearson is also considered, including those of Bienaymé, Abbe, and Helmert. In the section devoted to Student, readers will learn about his original, but somewhat faulty, derivation of the t-statistic (which was called z and slightly different from today’s t) and Fisher’s later consolidation of Student’s result. In this section, previous derivations of the t-distribution, based on inverse probability, by Lüroth and Edgeworth are also examined. Finally, in the section devoted to Fisher, readers will find a detailed account of Fisher’s development of estimation theory, significance testing, ANOVA and related techniques, and fiducial probability. Also included in this section are Fisher’s (in)famous disputes with Jeffreys and Neyman–Pearson, and the use of maximum likelihood and significance testing prior to Fisher.

The third and final part of this book deals with post-Fisherian developments. In this part, readers will learn about extensions to Fisher’s theory of estimation (by Darmois, Koopman, Pitman, Aitken, Fréchet, Cramér, Rao, Blackwell, Lehmann, Scheffé, and Basu), Wald’s powerful statistical decision theory, and the Bayesian revival ushered by Ramsey, de Finetti, Savage, and Robbins in the first half of the twentieth century.

A few of the sections in the book are non-historical in nature and have been denoted by an asterisk (*). Moreover, a handful of historical topics that are found in the “Further Extensions” sections have been given without detailed demonstration.

As in my previous book, I have strived for simplicity of exposition. I hope my efforts will inculcate love for statistics and its history in the reader.

Prakash [email protected] 22, 2015

Notes

1

Other books on the history of statistics include Hald (1990), Pearson (1978), Pearson and Kendall (1970), Kendall and Plackett (1977), Hald (2007), MacKenzie (1978), Benzecri (1982), Cowles (2001), Chatterjee (2003), Porter (1986), and Westergaard (1932).

2

See also p. 376 of the current book.

ACKNOWLEDGMENTS

Warren Ewens and Bruce Levin each read large portions of the original manuscript. Both encouraged me throughout the writing of the book. Various sections of earlier drafts were read by Bernard Bru, Bin Cheng, Daniel Courgeau, Andrew Dale, Sukumar Desai, Stephen Fienberg, Hans Fischer, Dominique Fourdrinier, David Hitchcock, Vesa Kuusela, Eugenio Regazzini, Nancy Reid, Christian Robert, Thomas Severini, Glenn Shafer, Craig Smorynski, Aris Spanos, Veronica Vieland, Alan Welsh, and Sandy Zabell. Jessica Overbey, Bei Wang, and Julia Wrobel helped verify the references. Finally, Ji Liao, Lynn Petukhova, Jiashi Wang, Gary Yu, Wenbin Zhu, and many students from my probability class assisted with the proofreading. To all these people I express my deepest gratitude. All remaining errors and omissions are, of course, my sole responsibility.

Susanne Steitz-Filler, Sari Friedman, and Allison McGinniss from Wiley were all very helpful in realizing this project.

My final and continued gratitude goes to my mother and late father.

INTRODUCTION: LANDMARKS IN PRE-LAPLACEAN STATISTICS

The word “statistics” is derived from the modern Latin statisticus (“state affairs”). Statisticus itself originates from the classic Latin status, from which word “state” is derived. In the eighteenth century,1 the German political scientist Gottfried Achenwall (1719–1779) brought statistisch into general usage as the collection and evaluation of data relating to the functioning of the state. The English word “statistics” is thus derived from the German statistisch.

Following Achenwall, the next landmark was the creation of the science of political arithmetic in England, in the eighteenth century. Political arithmetic was a set of techniques of classification and calculation on data obtained from birth and death records, trade records, taxes, credit, and so on. It was initiated in England by John Graunt (1620–1674) and then further developed by William Petty (1623–1687). In the nineteenth century, political arithmetic developed into the field of statistics, now dealing with the analysis of all kinds of data. Statistics gradually became an increasingly sophisticated discipline, mainly because of the powerful mathematical techniques of analysis that were infused into it.

The recognition that the data available to the statistician were often the result of chance mechanisms also meant that some notion of probability was essential both for the statistical analysis of data and the subsequent interpretation of the results. The calculus of probability had its origins well before the eighteenth century. In the sixteenth century, the physician and mathematician Gerolamo Cardano (1501–1575) made some forays into chance calculations, many of which were erroneous. His 15-page book entitled Liber de ludo aleae (Cardano, 1663) was written in the 1520s but published only in 1663. However, the official start of the calculus of probability took place in 1654 through the correspondence between Blaise Pascal (1623–1662) and Pierre de Fermat (1601–1665) concerning various games of chances, most notably the problem of points. Meanwhile, having heard of the exchange between the two Frenchmen, the Dutch mathematician Christiaan Huygens (1629–1695) wrote a small manual on probability, De Ratiociniis in ludo aleae (Huygens, 1657), which came out in 1657 as the first published book on probability. Thereupon, a brief period of inactivity in probability followed until Pierre Rémond de Montmort (1678–1719) published his book Essay d’Analyse sur les Jeux de Hazard in 1708 (Montmort, 1708). But the real breakthrough was to come through James Bernoulli’s (1654–1705) posthumous Ars Conjectandi (Bernoulli, 1713), where Bernoulli enunciated and rigorously proved the law of large numbers. This law took probability from mere games of chance and extended its applications to all kinds of world phenomena, such as births, deaths, accidents, and so on. The law of large numbers showed that, viewed microscopically (over short time intervals), measurable phenomena exhibited the utmost irregularity, but when viewed macroscopically (over an extended period of time), they all exhibited a deep underlying structure and constancy. It is no exaggeration then to say that Bernoulli’s Ars Conjectandi revolutionized the world of probability by showing that chance phenomena were indeed amenable to some form of rigorous treatment. The law of large numbers was to receive a further boost in 1730 through its refinement in the hands of Abraham de Moivre (1667–1754), resulting in the first derivation of the normal distribution.

In the meantime, two years before the release of the Ars Conjectandi, the Englishman John Arbuthnot (1667–1735) explicitly applied the calculus of probability to the problem of sex ratio in births and argued for divine providence. This was the first published test of a statistical hypothesis. Further works in demography were conducted by the Comte de Buffon (1707–1788), Daniel Bernoulli (1700–1782), and Jean le Rond d’Alembert (1717–1783).

Although Ars Conjectandi was duly recognized for its revolutionary value, James Bernoulli was not able to bring the book to its full completion before he passed away. One aspect of the problem not treated by Bernoulli was the issue of the probability of hypotheses (or causes), also known as inverse probability. This remained a thorny problem until it was addressed by Thomas Bayes (1701–1761) in the famous “An Essay towards solving a problem in the Doctrine of Chances” (Bayes, 1764). In the essay, again published posthumously, Bayes attacked the inverse problem addressed by Bernoulli. In the latter’s framework, the probability of an event was a known quantity; in the former’s scheme, the probability of an event was an unknown quantity and probabilistic statements were made on it through what is now known as Bayes’ theorem. The importance of this theorem cannot be overstated. But inasmuch as it was recognized for its revolutionary value, it also arose controversy because of a particular assumption made in its implementation (concerning the prior distribution to be used).

In addition to the aforementioned works on probability, another major area of investigation for the statistician was the investigation of errors made in observations and the particular laws such errors were subject to. One of the first such studies was the one performed by the English mathematician Thomas Simpson (1710–1761), who assumed a triangular error distribution in some of his investigations. Other mathematicians involved in this field were Daniel Bernoulli, Joseph-Louis Lagrange (1736–1813), Carl Friedrich Gauss (1777–1855), and especially Adrien-Marie Legendre (1752–1833), who was the first to publish the method of least squares.

Note

1

The early history of statistics is described in detail in the books by Pearson (1978) and Westergaard (1932).

PART ONE:LAPLACE

1THE LAPLACEAN REVOLUTION

1.1 PIERRE-SIMON DE LAPLACE (1749–1827)

Laplace was to France what Newton had been to England. Pierre-Simon de Laplace1 (Fig. 1.1) was born in Beaumont-en-Auge, Normandy, on March 23, 1749. He belonged to a bourgeois family. Laplace at first enrolled as a theology student at the University of Caen and seemed destined for the church. At the age of 16, he entered the College of Arts at the University of Caen for two years of philosophy before his degree in Theology. There, he discovered not only the mathematical writings of such greats as Euler and Daniel Bernoulli, but also his own aptitude for mathematical analysis. He moved to Paris in 1769 and, through the patronage of d'Alembert, became Professor in Mathematics at the École Royale Militaire in 1771.

Figure 1.1 Pierre-Simon de Laplace (1749–1827).

Wikimedia Commons (Public Domain), http://commons.wikimedia.org/wiki/File:Pierre-Simon_Laplace.jpg

Laplace lived through tumultuous political times: the French revolution took place in 1789, Robespierre came in power in a coup in 1793, Louis XVI was executed in the same year followed by that of Robespierre himself the next year, and Napoléon Bonaparte came to power in 1799 but fell in 1815 when the monarchy was restored by Louis XVIII. Laplace was made Minister of the Interior when Napoléon came in power but was then dismissed only after 6 weeks for attempting to “carry the spirit of the infinitesimal into administration.”2

But Napoléon continued to retain the services of Laplace in other capacities and bestowed several honors on him (senator and vice president of the senate in 1803, Count of the Empire in 1806, Order of the Reunion in 1813). Nevertheless, Laplace voted Napoléon out in 1814, was elected to the French Academy in 1816 under Louis XVIII, and then made Marquis in 1817.

Thus, throughout the turbulent political periods, Laplace was able to adapt and even prosper unlike many of his contemporaries such as the Marquis de Condorcet and Antoine Lavoisier, who both died. Laplace continued to publish seminal papers over several years, culminating in the two major books, Mécanique Céleste (Laplace, 1799) and Théorie Analytique des Probabilités (Laplace, 1812). These were highly sophisticated works and were accompanied by the easier books, Exposition du Système du Monde (Laplace, 1796) and Essai Philosophique sur les Probabilités (Laplace, 1814) aimed at a much wider audience.

From the very start, Laplace's research branched into two main directions: applied probability and mathematical astronomy. However, underlying each branch, Laplace espoused one unifying philosophy, namely, universal determinism. This philosophy was vindicated to a great extent in so far as celestial mechanics was concerned. By using Newton's law of universal gravitation, Laplace was able to mathematically resolve the remaining anomalies in the theory of the Solar System. In particular, he triumphantly settled the issue of the “great inequality of Jupiter and Saturn.” In the Exposition du Système du Monde, we can read:

We shall see that this great law [of universal gravitation]…represents all celestial phenomena even in their minutest details, that there is not one single inequality of their motions which is not derived from it, with the most admirable precisions, and that it explains the cause of several singular motions, just perceived by astronomers, and which were too slow for them to recognize their law.

(Laplace, 1796, Vol. 2, pp. 2–3)

Laplace appealed to a “vast intelligence,” dubbed Laplace's demon, to explain his philosophy of universal determinism3:

All events, even those that on account of their smallness seem not to obey the great laws of nature, are as necessary a consequence of these laws as the revolutions of the sun. An intelligence which at a given instant would know all the forces that move matter, as well as the position and speed of each of its molecules; if on the other hand it was so vast as to analyse these data, it would contain in the same formula, the motion of the largest celestial bodies and that of the lightest atom. For such an intelligence, nothing would be irregular, and the curve described by a simple air or vapor molecule, would seem regulated as certainly as the orbit of the sun is for us.

(Laplace, 1812, p. 177)

However, we are told by Laplace, ignorance of the underlying laws makes us ascribe events to chance:

…But owing to our ignorance regarding the immensity of the data necessary to solve this great problem, and owing to the impossibility, given our weakness, to subject to calculation those data which are known to us, even though their numbers are quite limited; we attribute phenomena which seem to occur and succeed each other without any order, to variable or hidden causes, who action has been designated by the word hazard, a word that is really only the expression of our ignorance.

(ibidem)

Probability is then a relative measure of our ignorance:

Probability is relative, in part to this ignorance, in part to our knowledge.

(ibidem)

It is perhaps no accident that Laplace's research into probability started in the early 1770s, for it was in this period that interest in probability was renewed among many mathematicians due to work in political arithmetic and astronomy (Bru, 2001b, p. 8379). Laplace's work in probability was truly revolutionary because his command of the powerful techniques of analysis enabled him to break new ground in virtually every aspect of the subject. The advances Laplace made in probability and the extent to which he applied them were truly unprecedented. While he was still alive, Laplace thus reached the forefront of the probability scene and commanded immense respect. Laplace passed away in Paris on March 5, 1827, exactly 100 years after Newton's death.

Throughout his academic career, Laplace seldom got entangled in disputes with his contemporaries. One notable exception was his public dissent with Roger Boscovich (1711–1787) over the calculation of the path of a comet given three close observations. More details can be found in Gillispie (2000, Chapter 13) and Hahn (2005, pp. 67–68).

Laplace has often been accused of incorporating the works of others into his own without giving due credit. The situation was aptly described by Auguste de Morgan4 hundreds of years ago. The following extract is worth reading if only for its rhetorical value:

The French school of writers on mathematical subjects has for a long time been wedded to the reprehensible habit of omitting all notice of their predecessors, and Laplace is the most striking instance of this practice, which he carried to the utmost extent. In that part of the “Mecanique Celeste” in which he revels in the results of Lagrange, there is no mention of the name of the latter. The reader who has studied the works of preceding writers will find him, in the “Théorie des Probabilités,” anticipated by De Moivre, James Bernoulli, &c, on certain points. But there is not a hint that any one had previously given those results from which perhaps his sagacity led him to his own more general method. The reader of the “Mecanique Celeste” will find that, for any thing he can see to the contrary, Euler, Clairaut, D'Alembert, and above all Lagrange, need never have existed. The reader of the “Systême du Monde” finds Laplace referring to himself in almost every page, while now and then, perhaps not twenty times in all, his predecessors in theory are mentioned with a scanty reference to what they have done; while the names of observers, between whom and himself there could be no rivalry, occur in many places. To such an absurd pitch is this suppression carried, that even Taylor's name is not mentioned in connexion with his celebrated theorem; but Laplace gravely informs his readers, “Nous donnerons quelques théorêmes généraux qui nous seront utiles dans la suite,” those general theorems being known all over Europe by the names of Maclaurin, Taylor, and Lagrange. And even in his Theory of Probabilities Lagrange's theorem is only “la formule (p) du numéro 21 du second livre de la Mécanique Céleste.” It is true that at the end of the Mecanique Celéste he gives historical accounts, in a condensed form, of the discoveries of others; but these accounts never in any one instance answer the question—Which pages of the preceding part of the work contain the original matter of Laplace, and in which is he only following the track of his predecessor?

(De Morgan, 1839, Vol. XXX, p. 326)

Against such charges, recent writers like Stigler (1978) and Zabell (1988) have come to Laplace's defense on the grounds that the latter's citation rate was no worse than those of his contemporaries. That might be the case, but the two studies also show that the citation rates of Laplace as well as his contemporaries were all very low. This is hardly a practice that can be condoned, especially when we know these mathematicians jealously guarded their own discoveries. Newton and Leibniz clashed fiercely over priority on the Calculus, as did Gauss and Legendre on least squares, though to a lesser extent. If mathematicians were so concerned that their priority over discoveries be acknowledged, then surely it was incumbent upon them to acknowledge the priority of others on work that was not their own.

1.2 LAPLACE'S WORK IN PROBABILITY AND STATISTICS

1.2.1 “Mémoire sur les suites récurro-récurrentes” (1774): Definition of Probability

This memoir (Laplace, 1774b) is among the first of Laplace's published works and also his first paper on probability (Fig. 1.2). Here, for the first time, Laplace enunciated the definition of probability, which he called a Principe (Principle):

The probability of an event is equal to the product of each favorable case by its probability divided by the product if each possible case by its probability, and if each case is equally likely, the probability of the event is equal to the number of favorable cases divided by the number of all possible cases.5

(Laplace, 1774b, OC 8, pp. 10–11)

The above is the classical (or mathematical) definition of probability that is still used today, although several other mathematicians provided similar definitions earlier. For example6:

Gerolamo Cardano's definition in Chapter 14 of the

Liber de ludo aleae

:

So there is one general rule, namely, that we should consider the whole circuit, and the number of those casts which represents in how many ways the favorable result can occur, and compare that number to the rest of the circuit, and according to that proportion should the mutual wagers be laid so that one may contend on equal terms.

(Cardano, 1663)

Gottfried Wilhelm Leibniz's definition in the

Théodicée

:

If a situation can lead to different advantageous results ruling out each other, the estimation of the expectation will be the sum of the possible advantages for the set of all these results, divided into the total number of results.

(Leibniz, 1710, 1969 edition, p. 161)

James (Jacob) Bernoulli's statement from the

Ars Conjectandi

:

… if complete and absolute certainty, which we represent by the letter a or by 1, is supposed, for the sake of argument, to be composed of five parts or probabilities, of which three argue for the existence or future existence of some outcome and the others argue against it, then that outcome will be said to have 3a/5 or 3/5 of certainty.

(Bernoulli, 1713, English edition, pp. 315–316)

Abraham de Moivre's definition from the

De Mensura Sortis

:

If p is the number of chances by which a certain event may happen, & q is the number of chances by which it may fail; the happenings as much as the failings have their degree of probability: But if all the chances by which the event may happen or fail were equally easy; the probability of happening to the probability of failing will be p to q.

(de Moivre, 1733, p. 215)

Figure 1.2 First page of Laplace's “Mémoire sur les suites récurro-récurrentes” (Laplace, 1774b)

Although Laplace's Principe was an objective definition, Laplace gave it a subjective overtone by later redefining mathematical probability as follows:

The probability of an event is thus just the ratio of the number of cases favorable to it, to the number of possible cases, when there is nothing to make us believe that one case should occur rather than any other.7

(Laplace, 1776b, OC 8, p. 146)

In the above, Laplace appealed to the principle of indifference8 and his definition of probability relates to our beliefs. It is thus a subjective interpretation of the classical definition of probability.

1.2.2 “Mémoire sur la probabilité des causes par les événements” (1774)

1.2.2.1 Bayes’ Theorem

The “Mémoire sur la probabilité des causes par les événements” (Laplace, 1774a) (Fig. 1.3) is a landmark paper of Laplace because it introduced most of the fundamental principles that he first used and would stick to for the rest of his career.9 Bayes’ theorem was stated and inverse probability was used as a general method for dealing with all kinds of problems. The asymptotic method was introduced as a powerful tool for approximating certain types of integrals, and an inverse version of the Central Limit Theorem was also presented. Finally the double exponential distribution was introduced as a general law of error. Laplace here presented many of the problems that he would later come back to again, each time refining and perfecting his previous solutions.

Figure 1.3 First page of Laplace's “Mémoire sur la probabilité des causes par les événements” (Laplace, 1774a)

In Article II of the memoir, Laplace distinguished between two classes of probability problems:

The uncertainty of human knowledge bears on events or the causes of events; if one is certain, for example, that a ballot contains only white and black tickets in a given ratio, and one asks the probability that a randomly chosen ticket will be white, the event is then uncertain, but the cause on which depends the existence of the probability, that is the ratio of white to black tickets, is known.

In the following problem: A ballot is assumed to contain a given number of white and black tickets in an unknown ratio, if one draws a white ticket, determine the probability that the ratio of white to black tickets in the ballot is p:q; the event is known and the cause unknown.

One can reduce to these two classes of problems all those that depend on the doctrine of chances.

(Laplace, 1774a, OC 8, p. 29)

In the above, Laplace distinguished between problems that require the calculation of direct probabilities and those that require the calculation of inverse probabilities. The latter depended on the powerful theorem first adduced by Bayes and which Laplace immediately enunciated as a Principe as follows:

PRINCIPE—If an event can be produced by a number n of different causes, the probabilities of the existence of these causes calculated from the event are to each other as the probabilities of the event calculated from the causes, and the probability of the existence of each cause is equal to the probability of the event calculated from that cause, divided by the sum of all the probabilities of the event calculated from each of the causes.10

(ibidem)

Laplace's first statement in the above can be written mathematically as follows: if C1, C2, …, Cn are n exhaustive events (“causes”) and E is another event, then

(1.1)

Equation (1.1) implies that

(1.2)

Equation (1.2) is Laplace's second statement in the previous quotation. It is a restricted version of Bayes’ theorem because it assumes a discrete uniform prior, that is, each of the “causes” C1, C2, …, Cn is equally likely: for .

It should be noted that Laplace's enunciation of the theorem in Eq. (1.2) in 1774 made no mention of Bayes’ publication 10 years earlier (Bayes, 1764), and it is very likely that Laplace was unaware of the latter's work. However, the 1778 volume of the Histoire de l'Académie Royale des Sciences, which appeared in 1781, contained a summary by the Marquis de Condorcet (1743–1794) of Laplace's “Mémoire sur les Probabilités,” which also appeared in that volume (Laplace, 1781). Laplace's article made no mention of Bayes or Price,11 but Condorcet's summary explicitly acknowledged the two Englishmen:

These questions [on inverse probability] about which it seems that Messrs. Bernoulli and Moivre had thought, have been since then examined by Messrs. Bayes and Price; but they have limited themselves to exposing the principles that can be used to solve them. M. de Laplace has expanded on them….

(Condorcet, 1781, p. 43)

As for Laplace himself, his acknowledgment of Bayes’ priority on the theorem came much later in the Essai Philosophique Sur les Probabilités:

Bayes, in the Transactions philosophiques of the year 1763, sought directly the probability that the possibilities indicated by past experiences are comprised within given limits; and he has arrived at this in a refined and very ingenious manner, although a little perplexing. This subject is connected with the theory of the probability of causes and future events, concluded from events observed. Some years later I expounded the principles of this theory….

(Laplace, 1814, English edition, p. 189)

It also in the Essai Philosophique that Laplace first gave the general (discrete) version of Bayes’ theorem:

The probability of the existence of anyone of these causes is then a fraction whose numerator is the probability of the event resulting from this cause and whose denominator is the sum of the similar probabilities relative to all the causes; if these various causes, considered a priori, are unequally probable it is necessary, in place of the probability of the event resulting from each cause, to employ the product of this probability by the possibility of the cause itself.

(ibid., pp. 15–16)

Equation (1.2) can thus be written in general form as

(1.3)

which is the form in which Bayes’ theorem is used today. The continuous version of Eq. (1.3) may be written as

(1.4)

where θ is a parameter, f(θ) is the prior density of θ, is joint density12 of the observations x, and is the posterior density of θ. It is interesting that neither of the above two forms (and not even those assuming a uniform prior) by which we recognize Bayes’ theorem today can be found explicitly in Bayes’ paper.

Laplace almost always used (1.4) in the form