24,99 €
An original account of willful ignorance and how this principle relates to modern probability and statistical methods
Through a series of colorful stories about great thinkers and the problems they chose to solve, the author traces the historical evolution of probability and explains how statistical methods have helped to propel scientific research. However, the past success of statistics has depended on vast, deliberate simplifications amounting to willful ignorance, and this very success now threatens future advances in medicine, the social sciences, and other fields. Limitations of existing methods result in frequent reversals of scientific findings and recommendations, to the consternation of both scientists and the lay public.
Willful Ignorance: The Mismeasure of Uncertainty exposes the fallacy of regarding probability as the full measure of our uncertainty. The book explains how statistical methodology, though enormously productive and influential over the past century, is approaching a crisis. The deep and troubling divide between qualitative and quantitative modes of research, and between research and practice, are reflections of this underlying problem. The author outlines a path toward the re-engineering of data analysis to help close these gaps and accelerate scientific discovery.
Willful Ignorance: The Mismeasure of Uncertainty presents essential information and novel ideas that should be of interest to anyone concerned about the future of scientific research. The book is especially pertinent for professionals in statistics and related fields, including practicing and research clinicians, biomedical and social science researchers, business leaders, and policy-makers.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 808
Veröffentlichungsjahr: 2014
HERBERT I. WEISBERG
Correlation Research, Inc.Needham, MA
Copyright © 2014 by John Wiley & Sons, Inc. All rights reserved.
Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.
Library of Congress Cataloging-in-Publication Data is available.
ISBN: 9780470890448
I have known that thing the Greeks knew not–uncertainty.…Mine is a dizzying country in which the Lottery is a major element of reality.
Jorge Luis Borges1
This fundamental requirement for the applicability to individual cases of the concept of classical probability shows clearly the role of subjective ignorance as well as that of objective knowledge in a typical probability statement.
Ronald Aylmer Fisher2
To a stranger, the probability that I shall send a letter to the post unstamped may be derived from the statistics of the Post Office; for me those figures would have but the slightest bearing on the question.
John Maynard Keynes3
1
. Jorge Luis Borges (1941). La Loteria en Babilonia (The Lottery in Babylon) in Ficciones (1944). Buenos Aires: Editorial Sur; Translated by Anthony Bonner and published as Ficciones (1962). New York: Grove Press.
2
. Ronald A. Fisher (1959b). Statistical Methods and Scientific Inference, 2nd ed. Edinburgh: Oliver and Boyd, p. 33.
3
. Keynes, John Maynard (1921). A Treatise on Probability. London: Macmillan, p. 71.
PREFACE
NOTE
ACKNOWLEDGMENTS
1 THE OPPOSITE OF CERTAINTY
Two Dead Ends
Analytical Engines
What is Probability?
Uncertainty
Willful Ignorance
Toward a New Science
Notes
2 A QUIET REVOLUTION
Thinking the Unthinkable
Inventing Probability
Statistics
The Taming of Chance
The Ignorance Fallacy
The Dilemma of Science
Notes
3 A MATTER OF CHANCE
Origins
The Famous Correspondence
What Did Not Happen Next
Against The Odds
Notes
4 HARDLY TOUCHED UPON
The Mathematics of Chance
Empirical Frequencies
A Quantum of Certainty
Notes
5 A MATHEMATICIAN OF BASEL
Publication at Last
The Art of Conjecturing
A Tragic Ending
Notes
6 A DEFECT OF CHARACTER
Man Without a Country
A Fraction of Chances
Notes
7 CLASSICAL PROBABILITY
Revolutionary Reverends
From Chances to Probability
Notes
8 BABEL
The Great Unraveling
Probability as a Relative Frequency
Probability as a Logical Relationship
Probability as a Subjective Assessment
Probability as a Propensity
Notes
9 PROBABILITY AND REALITY
The Razor's Edge
What Fisher Knew
What Reference Class?
A Postulate of Ignorance
Laplace's Error
Notes
10 THE DECISION FACTORY
Beyond Moral Certainty
Decisions, Decisions
Machine-Made Knowledge
Notes
11 THE LOTTERY IN SCIENCE
Scientific Progress
Fooled by Causality
Statistics for Humans: Bias or Ambiguity?
Regression toward the Mean
Notes
12 TRUST, BUT VERIFY
A New Problem
Trust, …
… But Verify
The Future
Mindful Ignorance
Notes
APPENDIX: THE PASCAL–FERMAT CORRESPONDENCE OF 1654
Notes
BIBLIOGRAPHY
INDEX
END USER LICENSE AGREEMENT
Chapter 3
Table 3.1
Table 3.2
Table 3.3
Table 3.4
Chapter 4
Table 4.1
Table 4.2
Chapter 5
Table 5.1
Chapter 8
Table 8.1
Table 8.2
Chapter 9
Table 9.1
Table 9.2
Table 9.3
Chapter 10
Table 10.1
Table 10.2
Chapter 11
Table 11.1
Appendix
Table A.1
Table A.2
Chapter 1
Figure 1.1
The two dimensions of uncertainty: ambiguity and doubt.
Chapter 3
Figure 3.1
Illustration of the Huygens rationale for the mathematical expectation. The value of a gamble with possible outcomes 7 and 3 equaled the sum of two bets whose values were known based on previously established principles.
Chapter 6
Figure 6.1
The normal distribution (bell-shaped curve).
Figure 6.2
An example of de Moivre's approximation to the binomial distribution. The area under the curve approximates the probability that the number of successes in 100 trials is between 40 and 60 when the probability on each trial is 0.50.
Chapter 7
Figure 7.1
Example of the type of problem solved by Bayes's Theorem.
Figure 7.2
Essential logic of the solution provided by Bayes's Theorem.
Figure 7.3
Bayes's famous “billiard-table” model: The position of the horizontal line marked by the black ball's landing spot is analogous to the unknown probability on each Bernoulli trial. A “success” is deemed to occur each time one of the white balls crosses the horizontal line.
Appendix
Figure A.1
Schematic of the problem of the points: player A lacks 1 point to win and player B lacks 2.
Figure A.2
Schematic of the problem of the points showing all possible outcomes when player A lacks 1 point to win and player B lacks 2.
Figure A.3
Pascal's evaluation of the unknown expectation in the problem of the points when player A lacks 1 point to win and player B lacks 2.
Figure A.4
Schematic of the problem of the points when player A lacks 1 point to win and player B lacks 3.
Figure A.5
Schematic of the problem of the points showing all possible outcomes when player A lacks 1 point to win and player B lacks 3.
Figure A.6
Fermat's evaluation of the problem of the points when player A lacks 1 point to win and player B lacks 3.
Cover
Table of Contents
Preface
Chapter
xi
xii
xiii
xiv
xv
xvi
xvii
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
373
374
375
376
377
378
379
380
381
382
383
384
385
415
416
417
418
419
420
421
422
423
424
425
426
427
429
430
431
432
433
434
The History of Science has suffered greatly from the use by teachers of second-hand material, and the consequent obliteration of the circumstances and the intellectual atmosphere in which the great discoveries of the past were made.
R. A. Fisher1
Sir Ronald A. Fisher, the founder of modern statistics, was certainly correct to point out how much is lost by abstracting major scientific developments from the context in which they evolved. However, it is clearly impractical for all but a few specialists to delve into original source material, especially when it is technical (or in Latin). In this book, I have attempted to convey some of the “circumstances and intellectual atmosphere” that have led to our modern idea of probability. I believe this is important for two reasons. First, to really appreciate what probability is all about, we must understand the process by which it has come about. Second, to transcend the limitations our current conception imposes on us, we must demystify probability by recognizing its inadequacy as the sole yardstick of uncertainty.
Willful Ignorance: The Mismeasure of Uncertainty can be regarded as two books in one. On one hand, it is a history of a big idea: how we have come to think about uncertainty. On the other, it is a prescription for change, especially with regard to how we perform research in the biomedical and social sciences. Modern probability and statistics are the outgrowth of a convoluted process that began over three centuries ago. This evolution has sharpened, but also narrowed, how we have come to reason about uncertainty.
Willful ignorance entails simplifying our understanding in order to quantify our uncertainty as mathematical probability. Probability theory will no doubt continue to serve us well, but only when it satisfies Einstein's famous maxim to “make everything as simple as possible but not simpler.” I believe that in many cases, we now deploy probability in a way that is simpler than it needs to be. The mesh through which probability often filters our knowledge may be too coarse. To reengineer probability for the future, we must account for at least some of the complexity that is now being ignored.
I have tried to tell the story of probability in 12 chapters. Chapter 1 presents the problem that needs to be addressed: the dilemma faced by modern research methodology. Chapter 2 is a whirlwind tour of the book's main themes. After these two introductory chapters, the next five are rich in historical detail, covering the period from 1654 to around 1800 during the time mathematical probability developed. Those readers who are more interested in current issues than history, might wish to skip ahead to read Chapter 12, in which I propose a “solution,” before circling back to the historical chapters.
Chapter 8 is a mix of history and philosophy, sketching the diversity of interpretations that have been attached to the basic concept of probability. In Chapter 9, with help primarily from Fisher, I attempt to cut through the massive confusion that still exists about probability. Chapter 10 discusses the origins of modern statistical methodology in the twentieth century, and its impact on scientific research. In Chapter 11, I explore how mathematical probability has come to dominate and in certain respects limit our thinking about uncertainty. The final chapter offers a suggestion for adapting statistical methodology to a new world of greatly expanded data and computational resources.
Previous historical writing about probability has focused almost exclusively on the mathematical development of the subject. From this point of view, the story is one of steady progress leading to a mature intellectual achievement. The basic principles of probability and statistics are well established. Remaining advances will be mainly technical, extending applications by building on solid foundations. The fundamental creative work is behind us; the interesting times are over.
There is, however, an all but forgotten flip side of the story. This non-mathematical aspect pertains to a fundamental question: what is probability? If we interpret probability as a measure of uncertainty in its broadest sense, what do we really mean by probability? This conceptual, or philosophical, conundrum was effectively put aside many decades ago as an unnecessary distraction, or even impediment, to scientific progress. It was never resolved, leaving the future (us) with an intellectual debt that would eventually come due.
As a result, we have inherited a serious problem. The main symptoms of this problem are confusion and stagnation in the biomedical and social sciences. There is enormously more research than ever before, but precious little useful insight being generated. Most important, there is a serious disconnect between quantitative research methodology and clinical practice. I believe that our stunted understanding of uncertainty is in many ways responsible for this gap.
I have proposed that willful ignorance is the central concept that underlies mathematical probability. In a nutshell, the idea is to deal effectively with an uncertain situation, we must filter out, or ignore, much of what we know about it. In short, we must simplify our conceptions by reducing ambiguity. In fact, being able to frame a mathematical probability implies that we have found some way to resolve the ambiguity to our satisfaction. Attempting to resolve ambiguity fruitfully is an essential aspect of scientific research. However, it always comes at a cost: we purchase clarity and precision at the expense of creativity and possibility.
For most scientists today, ambiguity is regarded as the enemy, to be overcome at all cost. But remaining open-minded in the face of ambiguity can ultimately generate deeper insights, while prematurely eliminating ambiguity can lead to intellectual sterility. Mathematical probability as we know it is an invention, a device to aid our thinking. It is powerful, but not natural or inevitable, and did not even exist in finished form until the eighteenth century.
The evolution of probability involved contributions by many brilliant individuals. To aid in keeping track of the important historical figures and key events, I have provided a timeline. This timeline is introduced at the beginning, and lists all of the major “landmarks” (mostly important publications) in the development of the concept of probability. At several points in later chapters, a streamlined version of the timeline has been inserted to indicate exactly when the events described in the text occurred. Of course, the question of which landmarks to include in the timeline can be debated. Scholars who are knowledgeable about the history of probability may disagree with my selections, and indeed I have second-guessed myself.
Please keep in mind that my focus is not on the mathematics of probability theory, but on the conception of probability as the measure of our uncertainty. In this light, it is noteworthy that my timeline ends in 1959. Perhaps that is because I lack perspective on the more recent contributions of my contemporaries. However, I believe that it underscores the lack of any profound advances in thinking about the quantification of uncertainty. There have certainly been some impressive attempts to broaden the mathematics of probability (e.g., belief functions, fuzzy logic), but none of these has (yet) entered the mainstream of scientific thought.
Some readers may feel that I have given short shrift to certain important topics that would seem to be relevant. For example, I had originally intended to say much more about the early history of statistics. However, I found that broadening the scope to deal extensively with statistical developments was too daunting and not directly germane to my task. I have similarly chosen not to deal with the theory of risk and decision-making directly. Mathematical probability is central to these disciplines, but entails aspects of economics, finance, and psychology that lie outside my main concerns. I would be delighted if someone else deems it worthwhile to explore the implications of willful ignorance for decision-making.
The time is ripe for a renewal of interest in the philosophical and psychological aspects of uncertainty quantification. These have been virtually ignored for half a century in the mistaken belief that our current version of probability is a finished product that is fully adequate for every purpose. This may have been true in the relatively data-poor twentieth century, but no longer. We need to learn how willful ignorance can be better applied for a more data-rich world. If this book can help stimulate a much-needed conversation on this issue, I will consider the effort in writing it to have been very worthwhile.
1
. R. A. Fisher's introduction, written in 1955, to Gregor Mendel's papers in Bennett, J. H. (ed.) (1965). Experiments in Plant Hybridisation: Gregor Mendel. Edinburgh: Oliver & Boyd, p. 6.
Researching and writing this book has been a long journey of exploration and self-discovery. It was on one hand a foray into the realm of historical research, which was new to me. On the other hand, it was a lengthy meditation on questions that had puzzled me throughout my career. I feel extremely fortunate to have had this opportunity. A major part of this good fortune is owed to many family members, friends, and colleagues who have made it possible by supporting me in various ways.
First I would like to thank my wife Nina, who has encouraged me to “follow my bliss” down whatever paths it might lead me. She gave me valuable feedback to help keep me on track when I needed a big-picture perspective. Finally, she has lifted from me the onerous administrative burden of obtaining permission when necessary for the numerous quotations included in the book. I would also like to thank my sons Alex and Dan Weisberg for their encouragement and feedback on various parts of earlier drafts.
Along the way, I have received helpful advice and suggestions from many colleagues and friends: Barbara Beatty, Art Dempster, Richard Derrig, Rich Goldstein, Jim Guszcza, Mike Meyer, Howard Morris, Jarvis Kellogg, Victor Pontes, Sam Ratick, Peter Rousmaniere, and Joel Salon. Dan, Mike, Howie, and Joel deserve special commendations for their thoughtful and diplomatic suggestions on the nearly complete manuscript.
For helping me to develop the general conception of this unusual book, and for his patience and many practical suggestions as this project evolved, I am grateful once again to Steve Quigley, my editor at Wiley. Robin Zucker, for all her talent, hard work, and patience in working with me on the graphics, deserves my sincere appreciation. In particular, her evocative cover design and the historical timelines truly reflect the spirit of the book and improve its quality. I would also like to pay homage to the memory of Stephen J. Gould, whose classic The Mismeasure of Man inspired my subtitle and was an examplar of what popular science writing can aspire to be.
Ten years ago, this book could not have been written (at least by me). The cost and effort involved in obtaining access to original source material would have been prohibitive. Today, thanks to the internet, these limitations have been largely alleviated. Two resources have been especially valuable to me in my research. One is Google Books, which allowed free access to photocopies of many original publications. The second is the incredibly useful and comprehensive website: Sources on the History of Probability and Statistics, maintained by Prof. Richard J. Pulskamp of Xavier University at http://www.cs.xu.edu/math/Sources/. He has assembled numerous original documents pertaining to the history of probability and statistics, including many of his own translations into English of documents originally in other languages.
Grateful acknowledgment is made to the American Philosophical Society for quotations from:
Gillispie, Charles Coulston (1972). Probability and politics: Laplace, Condorcet, and Turgot.
Proceedings of the American Philosophical Society
, 16: 1–20.
Grateful acknowledgment is made to the Oxford University Press on behalf of the British Society for the Philosophy of Science for the quotation from:
Popper, Karl R. (1959). The propensity interpretation of probability.
The British Journal for the Philosophy of Science
, 10: 25–42.
Grateful acknowledgment is made to Dover Publications, Inc. for the quotations from:
David, Florence N. (1962).
Games, Gods and Gambling: A History of Probability and Statistical Ideas
. London: Charles Griffin & Company.
Von Mises, Richard (1928).
Probability, Statistics, and Truth
. New York: Dover, 1981 [Translated by Hilda Geiringer from the 3rd edition of the German original].
Grateful acknowledgment is made to the Johns Hopkins University Press for the following material:
Bernoulli, Jacob (2006). Translated with an introduction and notes by Edith Dudley Sylla in 2006 as
The Art of Conjecturing, together with Letter to a Friend on Sets in Court Tennis.
Baltimore, MD: Johns Hopkins University Press. pp. 34, 37, 134, 139–140, 142, 193, 251, 305, 315–322, 325–330. Reprinted with permission from Johns Hopkins University Press.
Franklin, James (2001).
The Science of Conjecture: Evidence and Probability before Pascal
. Baltimore, MD: Johns Hopkins University Press. pp. 74–76, 234–235, 275, 366. Reprinted with permission from Johns Hopkins University Press.
I would particularly like to express my appreciation and admiration to Edith Dudley Sylla and James Franklin for their scholarship pertaining to the prehistory of modern probability. They have brought to light much previously obscure information that is essential for placing modern probability in perspective.
HERBERT I. WEISBERG
During the past century, research in the medical, social, and economic sciences has led to major improvements in longevity and living conditions. Statistical methods grounded in the mathematics of probability have played a major role in much of this progress. Our confidence in these quantitative tools has grown, along with our ability to wield them with great proficiency. We have an enormous investment of tangible and intellectual capital in scientific research that is predicated on this framework. We assume that the statistical methods as applied in the past so successfully will continue to be productive. Yet, something is amiss.
New findings often contradict previously accepted theories. Faith in the ability of science to provide reliable answers is being steadily eroded, as expert opinion on many critical issues flip-flops. Scientists in some fields seriously debate whether a majority of their published research findings are ultimately overturned1; the decline effect has been coined to describe how even strongly positive results often fade over time in the light of subsequent study2; revelations of errors in the findings published in prestigious scientific journals, and even fraud, are becoming more common.3 Instead of achieving greater certainty, we seem to be moving backwards. What is going on?
Consider efforts to help disadvantaged children through early childhood educational intervention. Beginning around 1970, the U.S. government sponsored several major programs to help overcome social and economic disadvantage. The most famous of these, Project Head Start, aimed to close the perceived gap in cognitive development between richer and poorer children that was already evident in kindergarten. The aims of this program were admirable and the rationale compelling. However, policy debates about the efficacy and cost of this initiative have gone on for four decades, with no resolution in sight. Research on the impact of Head Start has been extensive and costly, but answers are few and equivocal.
Medical research is often held up as the paragon of statistical research methodology. Evidence-based medicine, based on randomized clinical trials, can provide proof of the effectiveness and safety of various drugs and other therapies. But cracks are appearing even in this apparently solid foundation. Low dose aspirin for prevention of heart attacks was gospel for years but is now being questioned. Perhaps the benefits are less and the risks, more than we previously believed. Hormone replacement therapy for postmenopausal women was considered almost miraculous until a decade ago when a landmark study overturned previous findings. Not a year goes by without some new recommendation regarding whether, how, and by whom, hormone replacement should be used.
These are not isolated instances. The ideal of science is an evolution of useful theory coupled with improved practice, as new research builds upon and refines previous findings. Each individual study should be a piece of a larger puzzle to which it contributes. Instead, research in the biomedical and social sciences is rarely cumulative, and each research paper tends to stand alone. We fill millions of pages in scientific journals with “statistically significant” results that add little to our store of practical knowledge and often cannot be replicated. Practitioners, whose clinical judgment should be informed by hard data, gain little that is truly useful to them.
If I am correct in observing that scientific research has contributed so little to our understanding of “what works” in areas like education, health care, and economic development, it is important to ask why this is the case. I believe that much of the problem lies with our research methodology. At one end of the spectrum, we have what can be called the quantitative approach, grounded in modern probability-based statistical methods. At the other extreme are researchers who support a radically different paradigm, one that is primarily qualitative and more subjective. This school of thought emphasizes the use of case studies and in-depth participatory observation to understand the dynamics of complex causal processes.
Both statistical and qualitative approaches have important contributions to make. However, researchers in either of these traditions tend to view those in the other with suspicion, like warriors in two opposing camps peering across a great divide. Nowadays, the statistical types dominate, because methods based on probability and statistics virtually define our standard of what is deemed “scientific.” The perspective of qualitative researchers is much closer to that of clinicians but lacks the authority that the objectivity of statistics seems to provide.
Sadly, each side in this fruitless debate is stuck in a mindset that is too restricted to address the kinds of problems we face. Conventional statistical methods make it difficult to think seriously about causal processes underlying observable data. Qualitative researchers, on the other hand, tend to underestimate the value of statistical generalizations based on patterns of data. One approach willfully ignores all salient distinctions among individuals, while the other drowns in infinite complexity.
The resulting intellectual gridlock is especially unfortunate as we enter an era in which the potential to organize and analyze data is expanding exponentially. We already have the ability to assemble databases in ways that could not even be imagined when the modern statistical paradigm was formulated. Innovative statistical analyses that transcend twentieth century data limitations are possible if we can summon the will and imagination to fully embrace the opportunities presented by new technology.
Unfortunately, as statistical methodology has matured, it has grown more timid. For many, the concept of scientific method has been restricted to a narrow range of approved techniques, often applied mechanically. The result is to limit the scope of individual creativity and inspiration in a futile attempt to attain virtual certainty. Already in 1962, the iconoclastic statistical genius John Tukey counseled that data analysts “must be willing to err moderately often in order that inadequate evidence shall more often suggest the right answer.”4
Instead, to achieve an illusory pseudo-certainty, we dutifully perform the ritual of computing a significance level or confidence interval, having forgotten the original purposes and assumptions underlying such techniques. This “technology” for interpreting evidence and generating conclusions has come to replace expert judgment to a large extent. Scientists no longer trust their own intuition and judgment enough to risk modest failure in the quest for great success. As a result, we are raising a generation of young researchers who are highly adept technically but have, in many cases, forgotten how to think for themselves.
The dream of “automating” the human sciences by substituting calculation for intuition arose about two centuries ago. Adolphe Quetelet's famous treatise on his statistically based “social physics” was published in 1835, and Siméon Poisson's masterwork on probability theory and judgments in civil and criminal matters appeared in 1837.5, 6 It is perhaps not coincidental that in 1834 Charles Babbage first began to design a mechanical computer, which he called an analytical engine.7 Optimism about the potential ability of mathematical analysis, and especially the theory of probability, to resolve various medical, social, and economic problems was at its zenith.
Shortly after this historical moment, the tide turned. The attempt to supplant human judgment by automated procedures was criticized as hopelessly naïve. Reliance on mathematical probability and statistical methods to deal with such subtle issues went out of favor. The philosopher John Stuart Mill termed such uses of mathematical probability “the real opprobrium of mathematics.”8 The famous physiologist Claude Bernard objected that “statistics teach absolutely nothing about the mode of action of medicine nor the mechanics of cure” in any particular patient.9 Probability was again relegated to a modest supporting role, suitable for augmenting our reasoning. Acquiring and evaluating relevant information, and reaching final conclusions and decisions remained human prerogatives.
Early in the twentieth century, the balance between judgment and calculation began to shift once again. Gradually, mathematical probability and statistical methods based on it came to be regarded as more objective, reliable, and generally “scientific” than human theorizing and subjective weighing of evidence. Supported by rapidly developing computational capabilities, probability and statistics were increasingly viewed as methods to generate definitive solutions and decisions. Conversely, human intuition became seen as an outmoded and flawed aspect of scientific investigation.
Instead of serving as an adjunct to scientific reasoning, statistical methods today are widely perceived as a corrective to the many cognitive biases that often lead us astray. In particular, our naïve tendencies to misinterpret and overreact to limited data must be countered by a better understanding of probability and statistics. Thus, the genie that was put back in the bottle after 1837 has emerged in a new and more sophisticated guise. Poisson's ambition of rationalizing such activities as medical research and social policy development is alive and well. Mathematical probability, implemented by modern analytical engines, is widely perceived to be capable of providing scientific evidence-based answers to guide us in such matters.
Regrettably, modern science has bought into the misconception that probability and statistics can arbitrate truth. Evidence that is “tainted” by personal intuition and judgment is often denigrated as merely descriptive or “anecdotal.” This radical change in perspective has come about because probability appears capable of objectively quantifying our uncertainty in the same unambiguous way as measurement techniques in the physical sciences. But this is illusory:
Uncertain situations call for probability theory and statistics, the mathematics of uncertainty. Since it was precisely in those areas where uncertainty was greatest that the burden of judgment was heaviest, statistical tools seemed ideally suited to the task of ridding first the sciences and then daily life of personal discretion, with its pejorative associations of the arbitrary, the idiosyncratic, and the subjective. Our contemporary notion of objectivity, defined largely by the absence of these elements, owes a great deal to the dream of mechanized inference. It is therefore not surprising that the statistical techniques that aspire to mechanize inference should have taken on a normative character. Whereas probability theory once aimed to describe judgment, statistical inference now aims to replace it in the name of objectivity. … Of course, this escape from judgment is an illusion. … No amount of mathematical legerdemain can transform uncertainty into certainty, although much of the appeal of statistical inference techniques stems from just such great expectations. These expectations are fed … above all by the hope of avoiding the oppressive responsibilities that every exercise of personal judgment entails.10
Probability by its very nature entails ambiguity and subjectivity. Embedded within every probability statement are unexamined simplifications and assumptions. We can think of probability as a kind of devil's bargain. We gain practical advantages by accepting its terms but unwittingly cede control over something fundamental. What we obtain is a special kind of knowledge; what we give up is conceptual understanding. In short, by willingly remaining ignorant, in a particular sense, we may acquire a form of useful knowledge. This is the essential paradox of probability.
Among practical scientists nowadays, the true meaning of probability is almost never discussed. This is really quite remarkable! The proper interpretation of mathematical probability within scientific discourse was a hotly debated topic for over two centuries. In particular, questions about the adequacy of mathematical probability to represent fully our uncertainty were deemed important. Recently, however, there has been virtually no serious consideration of this critical issue.
As late as the 1920s, a variety of philosophical ideas about probability and uncertainty were still in the air. The central importance of probability theory in a general sense was recognized by all. However, there was wide disagreement over how the basic concept of probability should be defined, interpreted, and applied. Most notably, in 1921 two famous economists independently published influential treatises that drew attention to an important theoretical distinction. Both suggested that the conventional concept of mathematical probability is incomplete.
In his classic, Risk, Uncertainty and Profit, economist Frank Knight described the kind of uncertainty associated with ordinary probability by the term risk.11 The amount of risk can be deduced from mathematical theory (as in a game of chance) or calculated by observing many outcomes of similar events, as done, for example, by an insurance company. However, Knight was principally concerned with probabilities that pertain to another level of uncertainty. He had particularly in mind a typical business decision faced by an entrepreneur. The probability that a specified outcome will result from a certain action is ordinarily based on subjective judgment, taking into account all available evidence.
According to Knight, such a probability may be entirely intuitive. There may be no way, even in principle, to verify this probability by reference to a hypothetical reference class of similar situations. In this sense, the probability is completely subjective, an idea that was shared by some of his contemporaries. However, Knight went further by suggesting that this subjective probability also carries with it some sense of how much confidence in this estimate is actually entertained. So, in an imprecise but very important way, the numerical measure of probability is only a part of the full uncertainty assessment. “The action which follows upon an opinion depends as much upon the confidence in that opinion as upon the favorableness of the opinion itself.” This broader but vaguer conception has come to be called Knightian uncertainty.
Knightian uncertainty was greeted by economists as a new and radical concept, but was in fact some very old wine being unwittingly rebottled. One of the few with even an inkling of probability's long and tortuous history was John Maynard Keynes. Long before he was a famous economist,12 Keynes authored A Treatise on Probability, completed just before World War I, but not published until 1921. In this work, he probed the limits of ordinary probability theory as a vehicle for expressing our uncertainty. Like Knight, Keynes understood that some “probabilities” were of a different character from those assumed in the usual theory of probability. In fact, he conceived of probability quite generally as a measure of rational belief predicated on some particular body of evidence.
In this sense, there is no such thing as a unique probability, since the evidence available can vary over time or across individuals. Moreover, sometimes the evidence is too weak to support a firm numerical probability; our level of uncertainty may be better represented as entirely or partly qualitative. For example, my judgment about the outcome of the next U.S. presidential election might be that a Democrat is somewhat less likely than a Republican to win, but I cannot reduce this feeling to a single number between zero and one. Or, I may have no idea at all, so I may plead complete ignorance. Such notions of a non-numerical degree of belief, or even of complete ignorance (for lack of any relevant evidence), have no place in modern probability theory.
The mathematical probability of an event is often described in terms of the odds at which we should be willing to bet for or against its occurrence. For example, suppose my probability that the next president will be a Democrat is 40%, or 2/5. Then for me, the fair odds at which to bet on this outcome would be 3:2. So I will gain 3 dollars for every 2 dollars wagered if a Democrat actually wins, but lose my 2-dollar stake if a Republican wins. However, a full description of my uncertainty might also reflect how confident I would be about these odds. To force my expression of uncertainty into a precise specification of betting odds, as if I must lay a wager, may be artificially constraining.
Knight and Keynes were among a minority who perceived that uncertainty embodies something more than mere “risk.” They understood that uncertainty is inherently ambiguous in ways that often preclude complete representation as a simple number between zero and one. William Byers eloquently articulates in The Blind Spot how such ambiguity can often prove highly generative and how attempts to resolve it completely or prematurely have costs.13
As a prime example, Byers discusses how the ancient proto-concept of “quantity” evolved over time into our current conception of numbers:
A unidirectional flow of ideas is at best a reconstruction. It is useful and interesting but it misses something. It inevitably takes the present situation to be definitive. It tends to show how our present knowledge is superior in every way to the knowledge of the Greeks, for example. In so doing, it ignores the possibility that the Greeks knew things we do not know, that we have forgotten or suppressed. It seems heretical to suggest, but is nevertheless conceivable, that the Greek conception of quantity was in a certain way richer than our own, that their conception of number was deeper than ours. It was richer in the sense that a metaphor can be rich—because it comes with a large set of connoted meanings. It may well be that historical progress in mathematics is in part due to the process of abstraction, which inevitably involves narrowing the focus of attention to precisely those properties of the situation that one finds most immediately relevant. This is the way I shall view the history of mathematics and science—as a process of continual development that involves gain and loss, not as the triumphant march toward some final and ultimate theory.
In very much the same way, our modern idea of probability emerged from earlier concepts that were in some respects richer, and perhaps deeper.14
When Knight and Keynes wrote, the modern interpretation of probability had already almost completely crystallized. Shortly afterwards, further “progress” took the form of “narrowing the focus” even more. Although the broader issues addressed by Knight and Keynes were ignored, there remained (and still remains) one significant philosophical issue. Should probability be construed as essentially subjective or objective in nature? Is probability purely an aspect of personal thought and belief or an aspect of the external world? I will suggest that this is a false dichotomy that must be transcended.
As statistical methods became more prominent in scientific investigations, objectivity became paramount. It became widely accepted that science must not reflect any subjective considerations. Rather, it must deal with things that we can measure and count objectively. Thus, mathematical probability, interpreted as the frequency with which observable events occur, became the yardstick for measurement in the context of scientific research. This link to empirical reality created a false sense of objectivity that continues to pervade our research methodology today, although a more subjective interpretation has recently made some limited inroads.
From our modern viewpoint, there appears to be a sharp distinction between the subjective and objective interpretations of probability. However, to the originators of mathematical probability, these two connotations were merged in a way that can seem rather muddled to us. Were they confused, or do we fail to grasp something meaningful for them now lost to us? Is there, as Byers intimates, a “transcendental” perspective from which this distinction would no longer seem meaningful? If so, it might point the way toward a resolution of the conflict between apparently opposing ways of thinking about science. That, in turn, could help bridge the gap between scientific research and clinical practice.
At the core of science is the desire for greater certainty in a highly unpredictable world. Probability is often defined as a measure of our degree of certainty, but what is certainty? If I am certain that a particular event will occur, what does that mean? For concreteness, suppose I have just enrolled in a course on a subject with which I am not very familiar. For the moment, let us assume I am absolutely certain of being able to pass this course.
Dictionary definitions of the word “certain” contain phrases like “completely confident” and “without any doubt.” But what conditions would allow us to be in such a state of supreme confidence? Obviously, our knowledge of the situation or circumstances must be adequate for us to believe that the event must occur. My certainty about passing the course would rest on a matrix of information and beliefs that justify (for me) the necessary confidence.
Now, suppose that, on the contrary, I am not certain that I can pass this course. Clearly, this implies that I am lacking in certainty, but what does this mean? I would submit that uncertainty has two quite different connotations, or aspects. On one hand, my uncertainty can arise from doubt. So, the opposite of being sure of passing the course is being extremely doubtful. On the other hand, being uncertain could also mean that I just do not know whether I will be able to pass the course. I may suffer from confusion, because the situation facing me seems ambiguous. So the opposite of being highly confident would be something like having no idea, being literally “clueless.”
The situation can be described graphically as in Figure 1.1. We can conceptualize our degree of uncertainty as the resultant of two psychological “forces.” The horizontal axis represents our degree of doubt and the vertical axis, the degree of ambiguity we perceive. In general, certainty corresponds to the absence of both doubt and ambiguity. Our degree of uncertainty increases according to the “amounts” of doubt and ambiguity.
FIGURE 1.1 The two dimensions of uncertainty: ambiguity and doubt.
To be more specific, ambiguity pertains generally to the clarity with which the situation of interest is being conceptualized. How sure am I about the mental category, or classification, in which to place what I perceive to be happening? In order to exercise my judgment about what is likely to occur, I must have a sense of the relevant features of the situation. So, reducing uncertainty by resolving ambiguity, at least to some extent, seems to be a necessary prerequisite for assessing doubt. However, the relationship between ambiguity and doubt can be complex and dynamic. We certainly cannot expect to eliminate ambiguity completely before framing a probability.
In relation to probability, there is an important difference between these two dimensions of uncertainty. It seems natural to think of a degree of doubt as a quantity. We can, for example, say that our doubt that the Chicago Cubs will win the World Series this year is greater than our doubt that the New York Yankees will be the champions. We may even be able to assign a numerical value to our degree of doubtfulness. Ambiguity, on the other hand, seems to be essentially qualitative. It is hard to articulate what might be meant by a “degree of ambiguity.”
Probability in our modern mathematical sense is concerned exclusively with the doubt component of uncertainty. For us, the probability of a certain event is assigned a value of 1.0, or 100%. At the opposite end of the spectrum, an event that is deemed impossible, or virtually impossible, has a probability value of 0.0, or 0%. When we say that something, such as passing a course, has a probability of 95%, we mean that there exists a small degree of doubt that it will actually occur. Conversely, a probability of 5% implies a very strong doubt. In order to achieve such mathematical precision, we must suppress some subtlety or complexity that creates ambiguity. That way, our uncertainty can be ranged along a single dimension; our degree of confidence that the event will happen becomes identical to our lack of confidence that it will not happen.
As I will be explaining, this modern mathematical conception of probability emerged quite recently, about three centuries ago. Prior to its invention, there had existed for thousands of years earlier concepts of probability that were, indeed, “in a certain way richer than our own.” Especially important, these archaic ideas about uncertainty encompassed both dimensions of uncertainty, often without clearly distinguishing between them. Dealing with uncertainty implicitly entailed two challenges: attempting to resolve the ambiguity and to evaluate the doubtfulness of what we know. By essentially ignoring ambiguity in order to quantify doubt, we have obtained the substantial benefits of mathematical probability.
Since the 1920s, the equally important issue of ambiguity has been left outside the pale of scientific (and most philosophical) thinking about probability. The concerns raised by Keynes, Knight, and others back then were never addressed. In essence, they perceived the dangers in reducing probability to a technology for measuring doubt that ignores ambiguity. Failing to address this issue has led to the moribund state in which many areas of science now find themselves.
Suppose you are an emergency-room physician confronted by a new patient who displays an unusual constellation of symptoms. Rapid action is required, as the patient's condition is life-threatening. You are uncertain about the appropriate course of treatment. Your task is twofold: resolve your confusion about what type of illness you are observing and decide on the optimal therapy to adopt.
The diagnosis aims to eliminate, or at least minimize, any ambiguity pertaining to the patient's condition and circumstances. The physician's methodology may include a patient history, a physical examination, and a variety of clinical testing procedures. All of the resulting information is evaluated and integrated subjectively by the physician and possibly other specialist colleagues. The usual outcome is a classification of the patient into a specific disease category, along with any qualifying details (e.g., disease duration and severity, concomitant medications, allergies) that may be relevant to various potential treatment options. The process of attempting to resolve ambiguity in this situation, or in general, draws mainly on the clinician's expertise and knowledge. It entails logic and judgment applied to the array of evidence available.
Once the diagnosis is determined, however, the situation changes. The focus shifts to the selection of a treatment approach. The ambiguity about what is happening has been largely resolved. The remaining task is to choose from among the different therapeutic candidates. Putting aside the issue of side effects, the therapy offering the best chance of a cure will be selected. Not that long ago, this too was settled mainly by appealing to the presumed clinical expertise of the clinician (doctor, psychiatrist, social worker, teacher, etc.). Not any longer.
Since the 1950s, research to evaluate alternative treatment modalities has become increasingly standardized and objective. So-called evidence-based medicine depends heavily on statistical theory for the design, conduct, and analysis of research. This technology appears to generate knowledge that is demonstrably reliable because human subjectivity and fallibility have been eliminated from the process. Central to the modern research enterprise is probability theory. Probability defines the terms within which questions and answers are framed. Moreover, rather than merely advising the clinician, evidence-based recommendations based on statistics are intended to represent the “optimal” decision.15
When these new statistical methods were originally introduced, they promised to ameliorate serious problems that were then widespread, such as exaggerated claims of efficacy and outright quackery. However, it could not be imagined to what extent these safeguards would eventually come to define our standard of what constitutes respectable science. Statistical methods are now virtually the only way to conduct research in many fields, especially those that study human beings. What has resulted is a profound disconnect between clinical and statistical perceptions in many instances.
Research focuses on what is likely to happen “on the average” in certain specified circumstances. What, for example, is the effect on the mortality rate for middle-aged men who adopt a low-dose aspirin regimen? However, the clinician's concern is her particular patient. What will happen to Sam Smith if he starts on an aspirin regimen tomorrow? So, she may balk at mechanically following some general guidelines that are alleged to be statistically optimal:
Each of us is unique in the interplay of genetic makeup and environment. The path to maintaining or regaining health is not the same for everyone. Choices in this gray zone are frequently not simple or obvious. For that reason, medicine involves personalized and nuanced decision making by both the patient and doctor. … Although presented as scientific, formulas that reduce the experience of illness to numbers are flawed and artificial. Yet insurers and government officials are pressuring physicians and hospitals to standardize care using such formulas. Policy planners and even some doctors have declared that the art of medicine is passé, that care should be delivered in an industrialized fashion with nurses and doctors following operating manuals.16
In a real sense, clinicians and researchers tend to inhabit different conceptual worlds. The clinician is sensitive to the ambiguities of the “gray zone” in which difficult decisions must be made. She is in a land where the uncertainty is mainly of the “what is really going on here?” kind. For the researcher, on the other hand, the world must look black and white, so that the rules of probability math can be applied. This ambiguity blindness has become absolutely necessary. Without it, as we will see, the elaborate machinery of statistical methodology would come to a grinding halt. Consequently, there is no middle road between the clinical and statistical perspectives.
To be clearer on this point, let us hark back to our hypothetical problem of medical treatment. Suppose you have discovered the cause of the patient's symptoms, a rare type of virulent bacterial infection. Your problem now is to select which antibiotic to try first. There are three possibilities, each of which you have prescribed in the past many times. Your decision will hinge primarily on the probability of achieving a cure for this patient. We are accustomed to thinking that there actually exists, in some objective sense, a true probability that applies to this patient. In fact, there is no such probability out there!
A probability is a mental construct. In this sense, it is entirely subjective, or personal, in nature. However, probability must also have something to do with observations in the outside world. Indeed, an important (perhaps the only) relevant source of evidence may be a statistical rate of cure that you can find in the medical literature. Surely, these rates (percentages) can be interpreted as probabilities, or at least as approximations to them. Moreover, because these statistics are objective and precise, they are ordinarily expected to trump any subjective considerations.
The problem is that the “objective” probability may not be applicable to your particular patient. You may have specific knowledge and insight that influence your level of ambiguity or of doubt. For example, you might know that Sam Smith tends to comply poorly with complicated instructions for taking medicine properly. So, the statistically indicated treatment modality might not work as well for him as for the typical subject in the clinical studies. Ideally, you would possess some system for rationally taking account of all factors, both qualitative and quantitative, that seem relevant. However, the statistically based probability is not open to debate or refinement in any way. That is because probability by its very nature entails willful ignorance.
My term willful ignorance refers to the inescapable fact that probabilities are not geared directly to individuals. An assessment of probability can of course be applied to any particular individual, but that is a matter of judgment. By choosing a statistically based probability, you effectively regard this individual as a random member of the population upon which the statistics were derived. In other words, you ignore any distinguishing features of the individual or his circumstances that might modify the probability.
Relying uncritically on statistics for answers has become so second-nature to us that we have forgotten how recent and revolutionary this way of thinking really is. That is the crux of the problems we now face. Fortunately, there is a path out of the stagnation that plagues our research currently, and it is surprisingly simple, in theory. Unfortunately, practical implementation of this idea will require a seismic shift in behavior to achieve. In a nutshell, we must learn to become more mindful in applying probability-based statistical methods and criteria.
Mindfulness can be described as a way of perceiving and behaving that is characterized by openness, creativity, and flexibility. Psychologist Ellen Langer has suggested several qualities that tend to characterize a mindful person.
The ability to create new categories
Openness to new information
Awareness of multiple perspectives
A focus on process more than outcome
A basic respect for intuition.
These reflect precisely the attitude of a scientist who is motivated primarily by potential opportunities to advance human knowledge. Such an individual thrives on ambiguity, because it offers a wealth of possibilities to be explored.
In contrast, statistical methodology as it is applied today does not encourage these attributes. Rather, it has become mindless in its mechanical emphasis on prespecified hypotheses about average effects and formal testing procedures. It is no wonder that clinicians and qualitatively oriented researchers are uncomfortable with such unnatural modes of thinking:
Just as mindlessness is the rigid reliance on old categories, mindfulness means the continual creation of new ones. Categorizing and recategorizing, labeling and relabeling as one masters the world are processes natural to children. They are an adaptive and necessary part of surviving in the world.
These dynamic processes are equally essential in scientific research to cope with and resolve ambiguity. By relying so heavily on statistical procedures based on probability theory, ambiguity is effectively swept under the rug. This is a fundamental problem, because the essence of probability is quantification of doubt, which requires ambiguity about categories and labels to be willfully ignored. So, the problem of ambiguity cannot truly be evaded within the framework of probability, only sidestepped. A better approach is to broaden our understanding of uncertainty in order to resolve ambiguity more productively.
Am I arguing that mathematical probability and statistical methods should be avoided? Far from it. We will need these tools to address the problems of a much more data-rich future. However, our research methodology needs somehow to make more room for mindfulness, even though that will entail confronting ambiguity as well as doubt. Doing so may require us to cultivate a greater degree of tolerance for error, but this is unavoidable. Being wrong, as Kathryn Schulz reminds us, is normal; the problem is how we deal with this ever-present possibility.17
We must avoid worrying about mistakes to the point of stifling creativity. After all, it took Einstein 10 years and countless false leads before coming up with the general theory of relativity.18 It is OK to be wrong, as long as you are in the mode of continually testing and revising your theories in the light of evidence. In research that relies on the analysis of statistical data, that means placing more emphasis on successful replication of findings. By maintaining a balance between theoretical speculation and empirical evidence, we can increase the chances of generating knowledge that will make sense to both the researcher and the clinician.
Accomplishing this necessary evolution of methodology will entail both technical challenges and a major alteration of our scientific culture and incentives. Mathematical probability and statistical analysis will continue to play important roles in the future of research. But these tools must continue to develop in ways that take fuller advantage of the emerging opportunities. To conclude, I offer a personal anecdote that I have often used to exemplify the kind of mindful statistical analysis that will be necessary:
