Psychological Science Under Scrutiny -  - E-Book

Psychological Science Under Scrutiny E-Book

0,0
62,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Psychological Science Under Scrutiny explores a range of contemporary challenges to the assumptions and methodologies of psychology, in order to encourage debate and ground the discipline in solid science. 

  • Discusses the pointed challenges posed by critics to the field of psychological research, which have given pause to psychological researchers across a broad spectrum of sub-fields
  • Argues that those conducting psychological research need to fundamentally change the way they think about data and results, in order to ensure that psychology has a firm basis in empirical science
  • Places the recent challenges discussed into a broad historical and conceptual perspective, and considers their implications for the future of psychological methodology and research
  • Challenges discussed include confirmation bias, the effects of grant pressure, false-positive findings, overestimating the efficacy of medications, and high correlations in functional brain imaging
  • Chapters are authored by internationally recognized experts in their fields, and are written with a minimum of specialized terminology to ensure accessibility to students and lay readers

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 880

Veröffentlichungsjahr: 2017

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

Title Page

List of Contributors

Introduction

Psychological Science Under Scrutiny

The Replication Crisis

Other Challenges to Psychological Science

Media Attention

Scrutiny of Psychological Science as Healthy Self‐Criticism

This Book

Format and Table of Contents

References

Part I: Cross‐Cutting Challenges to Psychological Science

1 Maximizing the Reproducibility of Your Research

Project Planning

Project Implementation

Data Analysis

Reporting

Programmatic Strategies

Implementing These Practices: An Illustration with the Open Science Framework

Conclusion

References

2 Powering Reproducible Research

Background

False Positives and False Negatives

Type S and Type M Errors

Consequences of Low Power

Why Does Low Power Persist?

Possible Solutions

References

3 Psychological Science’s Aversion to the Null, and Why Many of the Things You Think Are True, Aren’t

Falsification and Null Results

Statistical Power

The File‐Drawer Problem

Moving From Protoscience to Science

References

4 False Negatives

Introduction

Dialectics of Allegedly Correct and Wrong Decisions in Statistical Hypothesis Testing

Concluding Remarks

Acknowledgment

References

5 Toward Transparent Reporting of Psychological Science

Historical Context

Disclosure Problem in Psychology

Why We Seek Transparent Reporting

Predicted Effects of (Non)compliance

Psychological and Institutional Obstacles to Transparent Reporting

Potential Remedies

Relation to Other Extant Challenges

Final Thoughts

References

6 Decline Effects

Four Types of Decline Effects

Separate Reflections on Unconventional Sources of Decline Effects by Schooler and Protzko

Meta‐Science and the Empirical Unpacking of the Decline Effect

Acknowledgments

References

7 Reverse Inference

Thinking is Inferring

The Basics of Reasoning

Inferences in Science

Reverse Inference Unleashed

Conclusion

References

8 The Need for Bayesian Hypothesis Testing in Psychological Science

The Logic of

p

‐values: Fisher’s Disjunction

The Illogic of

p

‐values

Generalizing Logic: The Bayesian Perspective

A Concrete Example: Results from AUFP Re‐examined

The Bayesian Remedy

Concluding Comments

Acknowledgement

References

Part II: Domain‐Specific Challenges to Psychological Science

9 The (Partial but) Real Crisis in Social Psychology

The (Partial but) Real Crisis in Social Psychology Defined and Illustrated

Why the (Partial but) Real Crisis in Social Psychology?

Five Recommendations for Ending the Partial but Real Crisis

References

10 Popularity as a Poor Proxy for Utility

Introduction

Creating the Implicit Prejudice Meme

Deconstructing the Implicit Prejudice Meme

What Is Implicit Prejudice, and Why Don’t Its Measures Agree?

Conclusion

References

11 Suspiciously High Correlations in Brain Imaging Research

Challenges of fMRI Analysis

Non‐independence/Circularity

Associated Problems that Emerged in the Fallout

Conclusions

Appendix A: A Quick Tour of fMRI Analysis

Appendix B: Power Calculations with False Discovery Rate Correction

References

12 Critical Issues in Genetic Association Studies

Introduction

Molecular Genetic Considerations for Genetic Association Studies

Genetic Association Study Designs

Leveraging Strength in Numbers: Multi‐site Studies of Genetic Association

Consortium‐based Mega‐analysis

The Issue of Heterogeneity in Multi‐site Studies

Genetic Association Studies in Research and Application: Hyperbole and Hope

The Implication of Issues in Genetic Association in the Study of Gene–Environment Interaction

The Future of Genetic Association Studies

Short‐term Benefits of Genetic Association Studies

Long‐Term Goals and Considerations for Genetic Association Studies

References

13 Is the Efficacy of “Antidepressant” Medications Overrated?

Fluoxetine: Creation of a Blockbuster “Antidepressant”

The Efficacy of “Antidepressant” Medications

Conclusion

References

14 Pitfalls in Parapsychological Research

Introduction

The Origins of Parapsychology

From Psychical Research to Parapsychology

Elusiveness of the Results

Parapsychology: The Elusiveness Continues

The Contemporary Scene: Conflicting Claims about Replicability

Retrospective vs. Prospective Replication

Bem’s Precognition Experiments: Confounding Exploratory with Confirmatory Research

The Overarching Pitfalls of Parapsychological Research

References

Part III: Psychological and Institutional Obstacles to High‐Quality Psychological Science

15 Blind Analysis as a Correction for Confirmatory Bias in Physics and in Psychology

Biases in the Research Process

Corrective Practices in Psychology

Blind Analysis in Physics

Applying Blind Analysis to Psychology

Discussion

Appendix

References

16 Allegiance Effects in Clinical Psychology Research and Practice

The Allegiance Effect

Allegiance Effects in Psychotherapy Research

Allegiance Effects in Assessment Research

Allegiance Effects in Allied Fields

Allegiance Effects among Practicing Clinicians

Proposed Explanations for Allegiance Effects

Proposed Remedies

Conclusion

References

17 We Can Do Better than Fads

Recognizing Fads

Intense Peer Pressure

Restriction of Range of Questions

Costs of Fads

Quality of Work

Encouraging Career Choices on the Basis of Fashion Rather than Passion

Fueling What May Be a Foolish Fad

Choking Off Important Areas of, or Approaches to, Research

Methods Rather than Substance

Fundamental Values

An Alternative Approach

References

AfterwordCrisis? What Crisis?

Whose Crisis?

The Limits of Replication

Our Subjects Are WEIRD

References

Index

End User License Agreement

List of Tables

Chapter 01

Table 1.1 Increasing the reproducibility of psychological research across the research lifecycle.

Chapter 02

Table 2.1 Beliefs about sample size and statistical power.

Chapter 03

Table 3.1 Probabilities of correct and incorrect decisions.

Chapter 08

Table 8.1 Bayes factors for different priors.

BF

 = 1/

BF

10

.

Chapter 12

Table 12.1 Practical considerations for conducting or evaluating a genetic association study.

Chapter 13

Table 13.1 Problematic design features in antidepressant trials conducted by pharmaceutical companies.

Table 13.2 Mean improvement (weighted for sample size) for fluoxetine and placebo in trials submitted to the FDA and published versions of the FDA trials.

Table 13.3 Problematic reporting features in antidepressant trials conducted by pharmaceutical companies.

Chapter 15

Table 15.1 Simulation 1: F Statistics.

Table 15.2 Simulation 2: F Statistics.

List of Illustrations

Chapter 02

Figure 2.1 The winner’s curse.

Note

: Suppose the true population effect is

μ

, but the study testing for that effect is small and lacks power. Only those findings that by chance happen to be much greater than the true effect will pass the threshold for statistical significance (

α

 = 0.05). This is often referred to as winner’s curse; the scientists whose study yields results that pass the threshold for statistical significance have “won” by finding evidence of an effect, but are also “cursed” as their results are a gross overestimation of the true population effect. This effect inflation is also referred to as a Type M (magnitude) error.

Chapter 03

Figure 3.1 Illustration of parameters affecting power. (a) Power for population effect size

δ

 = 0.7 and sample sizes of

n

1

 = 

n

2

 = 20. (b) Effect of greater population effect size (

δ

 = 1) on power. (c) Effect of increasing sample size to

n

1

 = 

n

2

 = 50 on power.

Chapter 04

Figure 4.1 Terminological conventions borrowed from signal detection theory.

Figure 4.2 Graphical illustration of the signal‐detection framework from which the terms “false positive” and “false negative” are borrowed.

Figure 4.3 Impressive hits concerning a specific rule (inner circle) can prevent researchers from discovering that the data reflect a more general rule (outer circles).

Chapter 06

Figure 6.1 Changes in the heritability of intelligence for yearly cohorts of almost every male in Norway when they are 18 years old.

Chapter 08

Figure 8.1 A trio of

p

‐values, showing that the diagnosticity of a significant result hinges on the specification of the alternative hypothesis. Top panels: a significant result that is ambiguous; middle panels: a significant result that is moderately informative; bottom panels: a significant result that is evidence in favor of the null hypothesis. The left column shows the population distribution under

H

1

, and the right column shows the two relevant sampling distributions (i.e., one under

H

0

, the other under

H

1

) of the test statistic for the difference between 25 participants viewing AUFP stimuli and 25 participants viewing non‐AUFP stimuli.

Figure 8.2 A boxing analogy of the

p

‐value. By considering only the state of boxer

H

0

, the Fisherian referee makes an irrational decision.

Chapter 11

Figure 11.1 The set of correlations surveyed by Vul et al. (2009a), showing how the absolute correlation (

y

) varies with sample size of the study (

x

), along with the marginal histograms of both sample size and absolute correlation. Individual observations are color‐coded by whether a request for information from the authors revealed the analysis to be independent (black), non‐independent (red), or if no response was obtained (blue). The vast majority of the surprisingly high correlations (

r

 > 0.7) were obtained by a non‐independent analysis procedure that is guaranteed to inflate effect size, especially when sample sizes are small.

Figure 11.2 Illustration of the non‐independence error. The sampling distribution of the sample correlation varies with the sample size (columns) and the true underlying population correlation (rows; illustrated as a solid black dot on each histogram). A statistical significance threshold (here we use the common cluster‐height threshold of

p

 < 0.001), however, yields a constant critical correlation value for every sample size (black lines). The average sample correlations that pass the significance threshold (open circles) are much higher than their true population correlations unless the statistical power of the threshold is high (meaning that most of the sampling distribution is larger than the threshold, as in the case of

n

 = 64,

r

 = 0.75). Consequently, the selected sample correlations are very likely to be much higher than the true populations correlations.

Figure 11.3 The mechanism of bias in non‐independent analyses. Even in the presence of non‐zero true correlations (x axis), the sample correlations (y) selected as exceeding a particular threshold are systematically overestimated. With a sample size of 16, the minimum sample correlation to pass a common

p

 < 0.001 whole‐brain threshold is quite large, ensuring that all observed correlations will be large, even if their true population correlations are small.

Figure 11.4 What is the expected value of the peak correlation reported from an analysis? The expected maximum correlation (y) increases with the number of independent brain regions it is chosen from (x), yielding large overestimates of the true correlation, regardless of its value (colors). Lines reflect the expectation, while shaded regions show the 90% interval. These calculations used a sample size of 16 subjects.

Figure 11.5 How does multiple comparisons correction influence the bias from non‐independent analyses? We simulated how the absolute selected sample correlation (y) relates to the absolute true underlying correlation (x) for different numbers of subjects (8, 16, and 32), as we varied the statistical threshold (between

p

 < 10

0

, to

p

 < 10

−5

; with larger circles indicating more stringent thresholds). For each threshold, we show both the average, and the 90% interval of selected and true correlations. Bias (discrepancy between selected and true correlation – y‐distance above the diagonal identity line) is smaller under larger sample sizes, but increases systematically as the statistical threshold becomes more conservative. (The distribution of population correlations is pictured above in gray; this distribution captures the common assumption that there are many small correlations, and few large ones, in the brain; formally, this is obtained via a truncated normal distribution with a mean of 0 and a standard deviation of 1/3 on the Fisher z’ transforms of the population correlations.)

Figure 11.6 The influence of statistical power on overestimation from non‐independent analyses. (Left) Average selected correlation (

x

) under different true population correlations (

y

); each point represents a particular sample size, with the color corresponding to the statistical power of a

p

 < 0.001 threshold with that sample size and true population correlation. Although the relationship is not numerically uniform across population correlations, in all cases, less power means greater overestimation. (Middle) Magnitude of overestimation of the coefficient of determination (

r

2

): the difference between the selected sample

r

2

and the population

ρ

2

decreases with the power of the test. (Right) Collapsing over true population correlations, statistical power (

x

) seems to impose an upper bound on the magnitude of overestimation, such that the maximum observed overestimate decreases as power increases.

Figure 11.7 The importance of adequate multiple comparisons correction. As the number of independent brain regions in a whole‐brain analysis increases (

x

), the probability of falsely detecting a correlation (or any other signal) increases if the statistical threshold is held constant. The common

p

 < 0.001 threshold is sufficient to correct for 50 multiple comparisons to the

α

 = 0.05 level, but will yield more than 60% false positives if there are 1000 voxels in the whole‐brain analysis.

Figure 11.8 The correlations surveyed in Vul et al. (2009a), plotted as a function of the number of subjects, and the (absolute) reported correlation. Color corresponds to the (uncorrected)

p

‐value of the correlation, and lines indicate the critical correlation values at different

α

levels. While the reported correlations are large, they are not very significant, especially when considering that many of them arose from whole‐brain analyses that would require multiple comparisons correction.

Figure 11.9 Statistical power (

y

) for Bonferroni‐corrected correlation tests as a function of population correlation (panels), sample size (lines), and the number of independent correlations in the analysis (

x

). A small population correlation (

ρ

 = 0.25; left) yields low power even with few independent correlations. In contrast, large correlations (

ρ

 = 0.75; right) can be tested with high power with just 16 subjects, provided that the analysis considers only one correlation; however, a whole‐brain analysis with 1000 correlations requires twice as many subjects to achieve the same level of power. A test for an optimistic but plausible population correlation (

ρ

 = 0.5; middle) requires nearly 100 subjects to achieve a high level of power in a whole‐brain analysis.

Figure 11.10 (a) The histogram of sample sizes from the studies surveyed in Vul et al. (2009a), color coded to match the colors in Figure 11.9. (b) Histograms of the power these studies will have to detect a population correlation of 0.5 or 0.75, either with a single measured correlation, or with a 1000‐voxel whole‐brain analysis. The sample sizes used in these studies offer a lot of power for detecting an implausibly large population correlation in a univariate analysis (

ρ

 = 0.75, one region), but all have less than 20% power to detect a plausible (

ρ

 = 0.5) correlation in a whole‐brain analysis.

Figure 11.11 Sample size required (

y

) to achieve a certain level of power (

x

) as a function of the population correlation (panels), and the number of Bonferroni‐corrected comparisons (brain regions). A realistically small population correlation (

ρ

 = 0.25) will require hundreds of subjects in a whole‐brain analysis (e.g., 1000 voxels) to achieve adequate power. However, even optimistic but plausible population correlations (

ρ

 = 0.5) require many more subjects than are commonly run in whole‐brain across‐subject correlation studies.

Figure 11.12 Statistical power (

y

) for FDR‐corrected correlation tests as a function of population correlation (panels), sample size (lines), and the proportion of voxels in the whole brain that contain the effect (

x

). A small population correlation (

ρ

 = 0.25; left) yields low power even when nearly 30% of brain voxels have this signal. In contrast, large correlations (

ρ

 = 0.75; right) can be tested with high power with just 16 subjects, provided that 30% of the voxels contain the effect; however, if only 1/1000 voxels carry the signal, then twice as many subjects are needed to achieve the same level of power. A test for an optimistic, but plausible, population correlation (

ρ

 = 0.5; middle) that is highly localized (occurring in 1/1000 voxels of the brain) requires nearly 100 subjects to achieve a high level of power.

Figure 11.13 Histograms of the power the studies surveyed by Vul et al. (2009a) will have to detect different population correlations using FDR correction (for

ρ

 = 0.5 and

ρ

 = 0.75, under different prevalence rates of the effect among tested voxels). 36% of the sample sizes used in these studies offer a lot of power for detecting an implausibly large and dense population correlation (

ρ

 = 0.75, prevalence = 10%); but all have less than 30% power to detect a plausible (

ρ

 = 0.5) correlation with a prevalence of 1%; and less than 10% power if the prevalence is 1/1000.

Figure 11.14 Sample size required (

y

) to achieve a certain level of power (

x

) as a function of the population correlation (panels), and the proportion of signal‐carrying voxels in the FDR‐corrected analysis. A realistically small population correlation (

ρ

 = 0.25) will require hundreds of subjects to achieve adequate power. However, even optimistic but plausible population correlations (

ρ

 = 0.5) will require many more subjects than are commonly run in whole‐brain across‐subject correlation studies, if true effects are as sparse as reported results suggest.

Chapter 13

Figure 13.1 Selective and multiple publication of fluoxetine trials submitted to the FDA.

Chapter 15

Figure 15.1 Reported estimates of various physical parameters by year of publication.

Figure 15.2 Results of Conley et al. (2006, Figure 6). Grey contours represent the 68.3%, 95.4%, and 99.7% confidence regions.

Figure 15.3 Six sets of blinded means perturbed by “cell scrambling.” (By chance, Set 1 is identical to the unblinded raw data.)

Figure 15.4 Raw means from Simulation 1 (left panel) and raw means perturbed by noise (right panel).

Figure 15.5 Raw means from Simulation 1 (left panel) and raw means perturbed by cell‐specific bias (right panel).

Figure 15.6 Raw means from Simulation 1 (left panel) and raw means perturbed by both noise and cell‐specific bias (right panel).

Figure 15.7 Raw means from Simulation 1 (left panel) and blinded means perturbed by “row scrambling” (right panel).

Figure 15.8 Raw means from Simulation 2 (left panel) and blinded means perturbed by cell‐specific bias (right panel).

Guide

Cover

Table of Contents

Begin Reading

Pages

iii

vii

viii

ix

x

xi

xii

xiii

xiv

xv

xvi

xvii

xviii

xix

xx

xxi

xxii

xxiii

xxiv

xxv

1

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

Psychological Science Under Scrutiny

Recent Challenges and Proposed Solutions

 

 

Edited by

 

Scott O. LilienfeldIrwin D. Waldman

 

 

 

 

 

 

 

This edition first published 2017© 2017 John Wiley & Sons, Inc

Registered OfficeJohn Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK

Editorial Offices350 Main Street, Malden, MA 02148‐5020, USA9600 Garsington Road, Oxford, OX4 2DQ, UKThe Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK

For details of our global editorial offices, for customer services, and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com/wiley‐blackwell.

The right of Scott O. Lilienfeld and Irwin D. Waldman to be identified as the authors of the editorial material in this work has been asserted in accordance with the UK Copyright, Designs and Patents Act 1988.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.

Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book.

Limit of Liability/Disclaimer of Warranty: While the publisher and authors have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. It is sold on the understanding that the publisher is not engaged in rendering professional services and neither the publisher nor the author shall be liable for damages arising herefrom. If professional advice or other expert assistance is required, the services of a competent professional should be sought.

Library of Congress Cataloging‐in‐Publication Data

Names: Lilienfeld, Scott O., 1960– editor. | Waldman, Irwin D., editor.Title: Psychological science under scrutiny / edited by Scott O. Lilienfeld, Irwin D. Waldman.Description: Hoboken : Wiley, [2017] | Includes bibliographical references and index.Identifiers: LCCN 2016031603 (print) | LCCN 2016036478 (ebook) | ISBN 9781118661079 (pbk.) | ISBN 9781118661086 (pdf) | ISBN 9781118661048 (epub)Subjects: LCSH: Psychology.Classification: LCC BF121 .P76217 2017 (print) | LCC BF121 (ebook) | DDC 150.72–dc23LC record available at https://lccn.loc.gov/2016031603

A catalogue record for this book is available from the British Library.

Cover image: © Takashi Kitajima/Getty Images, Inc.

List of Contributors

Amy AdkinsVirginia Commonwealth University

Paul BloomYale University

Marcus T. BoccacciniSam Houston State University

Katherine S. ButtonUniversity of Bath

Brett J. DeaconUniversity of Wollongong, School of Psychology, New South Wales, Australia

Danielle DickVirginia Commonwealth Univesity

Christopher J. FergusonStetson University

Klaus FiedlerUniversity of Heidelberg

Moritz HeeneLudwig Maximillian University Munich

Ray HymanUniversity of Oregon

Leslie K. JohnHarvard University

Joachim I. KruegerBrown University

Etienne P. LeBelMontclair State University

Scott O. LilienfeldEmory University

Alexander LyUniversity of Amsterdam

Robert J. MacCounStanford University

David MarcusWashington State University

Dora MatzkeUniversity of Amsterdam

Gregory MitchellUniversity of Virginia

Richard D. MoreyUniversity of Groningen

Marcus R. MunafòMRC Integrative Epidemiology Unit, UK Centre for Tobacco and Alcohol Studies, and School of Experimental Psychology, University of Bristol, United Kingdom

Daniel C. MurrieUniversity of Virginia

Harold PashlerUCSD Psychology

Saul PerlmutterUniversity of California at Berkeley

Anthony R. PratkanisUniversity of California, Santa Cruz

Elizabeth Prom‐WormleyVirginia Commonwealth University

John ProtzkoUniversity of California, Santa Barbara

Jeffrey N. RouderUniversity of Missouri

Jonathan W. SchoolerUniversity of California, Santa Barbara

Malte SchottUniversity of Heidelberg

Glen I. SpielmansMetropolitan State University, Department of Psychology, MinnesotaUniversity of Wisconsin, Department of Counseling Psychology, Wisconsin

Helen SteingroeverUniversity of Amsterdam

Robert J. SternbergCornell University, Ithaca, New York

Philip E. TetlockUniversity of Pennsylvania

Josine VerhagenUniversity of Amsterdam

Edward VulUCSD Psychology

Eric‐Jan WagenmakersUniversity of Amsterdam

Irwin D. WaldmanEmory University

IntroductionPsychological Science in Perspective

Scott O. Lilienfeld and Irwin D. Waldman

The essence of science, including psychological science, is ruthless and relentless self‐criticism. At its best, psychological science subjects cherished claims to searching scrutiny. Conclusions that survive close examination are provisionally retained; those that do not are modified or jettisoned. In this way, psychological science, like other sciences, is ultimately self‐correcting and progressive.

Some authors (e.g., Berezow, 2015; Hartsfield, 2015) have questioned this upbeat appraisal, and have argued without qualification that psychology is not a science. This pronouncement neglects the crucial point that science is not a body of knowledge; it is an approach to acquiring and evaluating knowledge. Specifically, it is an approach that strives to reduce error by implementing methodological safeguards, such as randomization to experimental conditions, the use of blinded observations, and sophisticated statistical analyses, thereby yielding a closer approximation to reality (Lilienfeld, 2010). By these standards, much of contemporary psychology is every bit as scientific as traditional “hard” sciences, such as chemistry and physics.

By availing itself of these bulwarks against error, psychological science has been quite successful across myriad domains. Moreover, it has spawned numerous discoveries of both theoretical and practical importance. To take merely a handful of salient examples, psychological science has helped us to better understand the basic mechanisms of learning, the nature of memory, the structure of emotion, the nature of individual differences in cognitive abilities, and the correlates and causes of many mental disorders (Hunt, 2009). Moreover, some psychological findings, such as classical conditioning, visual afterimages, the serial position effect in memory, the impact of peers on conformity, and the effects of prolonged exposure on pathological anxiety, are just as replicable as those in the hard sciences (Meehl, 1986). Psychological science has also borne fruit in such real‐world applications as aptitude testing, political polling, behavioral medicine, advertising, eyewitness testimony, the design of airplane cockpits, automobile safety, techniques for teaching language to children with intellectual disability, the reduction of prejudice in classrooms, and evidence‐based psychotherapies that have alleviated the suffering of tens of thousands of individuals with mood, anxiety, eating, sleep, and substance disorders (Zimbardo, 2004). There is ample reason to be proud of psychological science and its accomplishments.

Psychological Science Under Scrutiny

Nonetheless, over the past decade, and the past several years in particular, the prevailing narrative of psychological science as a progressive discipline that is characterized by replicable findings has been cast into serious doubt (Yong, 2012). More broadly, the commonly accepted methodologies of psychological science have come under withering attack from both within and outside the profession. Many of the pointed challenges posed by critics have given pause to psychological researchers across a broad spectrum of subfields, including experimental social psychology, cognitive psychology, functional brain imaging, molecular behavioral and psychiatric genetics, the validity of projective techniques (such as the Rorschach inkblot test), psychotherapy outcome research, and eyewitness memory. These scholars have argued that psychological findings are considerably less trustworthy than many of us, the two editors of this book included, have long presumed. This edited book imparts the story of these recent critical appraisals of “business as usual” across a broad spectrum of domains of psychological science. It also describes what our field has learned from these critiques, and how psychological science can improve in response to them.

Much of the impetus behind these recent criticisms stems from the influential work of Stanford University epidemiologist John Ioannidis, whose 2005 article, “Why most published research is false,” has engendered considerable self‐reflection in medicine and related fields (as of this writing, this article has been cited over 4,000 times by other scholars). According to Ioannidis’s (2005) eye‐opening analysis, approximately 40% of published findings in medicine are incorrect or substantially overestimated in magnitude. Whether this percentage is itself overestimated remains a lively topic of debate (e.g., Goodman & Greenland, 2007), but there can be little doubt that many widely ballyhooed medical findings may be less robust than commonly assumed. As one striking example, when the biotechnology firm Amgen recently attempted to replicate 53 “landmark” published findings on cancer treatment, they failed in 47 cases (Begley & Ellis, 2012). Although Ioannidis and several other authors have directed their broadsides toward medicine and science more generally, most critics have pointed the quills of their arrows increasingly at psychological science.

Indeed, in the pages of our field’s most prestigious journals, including Psychological Science, Perspectives on Psychological Science, Psychological Methods, American Psychologist, and the Journal of Personality and Social Psychology, scholars across diverse subdisciplines have maintained that the standard approaches adopted in published psychological investigations tend to yield a disconcertingly large number of false positive findings (e.g., Pashler & Wagenmakers, 2012). Among other things, these researchers have observed that psychological investigators sometimes confuse exploratory (hypothesis generation) with confirmatory (hypothesis testing) modes of data analysis, thereby inflating the risk of erroneous conclusions.

Exploratory data analysis, although enormously useful for certain purposes (Tukey, 1977), can lend itself to a host of abuses. In particular, critics have raised legitimate concerns regarding HARKing (hypothesizing after results are known), which refers to the tendency to portray post‐hoc conclusions as a priori hypotheses (Kerr, 1998), and p‐hacking, which refers to a family of practices that can cause findings that were initially statistically nonsignificant to dip below the threshold of statistical significance (typically below the standard p = 0.05 threshold; Lindsay, 2015; Simonsohn, Nelson, & Simmons, 2014). These worrisome but often largely overlooked practices are both prevalent and detrimental to the progress of psychological science. p‐hacking practices include exclusion of outliers, transformation of distributions, combining one or more subgroups, “cherry‐picking” of positive findings within studies (more technically termed outcome reporting bias; Chan, Krleža‐Jerić, Schmid, & Altman, 2004), splitting analyses by demographic groups (e.g., males versus females), and repeatedly commencing and halting data collection until significance level drops below the p = 0.05 level (optional starting and stopping points; Gilovich, 1991). Some of these practices, such as excluding outliers or transforming distributions, are often entirely appropriate in exploratory research, as they can point investigators toward fruitful questions to be pursued in future research. Nevertheless, these practices can become exceedingly problematic when they are conducted on a post‐hoc basis but are reported in published articles as though they were planned.

Needless to say, p‐hacking can result in pronounced overestimates of the prevalence of statistically significant effects in given fields, as well as substantially inflated estimates of the average effect size in these fields. p‐hacking practices within psychology and allied disciplines may also help to account for the curious finding that the proportion of positive findings in psychology and psychiatry – approximately 90% – apparently exceeds that in all other domains of science (Fanelli, 2010). Indeed, given that the average statistical power of studies in psychology is low – just over 40% by some estimates (Cohen, 1962; Rossi, 1990; Sedlmeier & Gigerenzer, 1989) – this remarkably high percentage is almost surely “too good to be true.” That is, the proportion of statistically significant findings in psychology appears to be considerably larger than would be expected given the modal statistical power of investigations, again raising the specter of false positive findings.

Over the past several years, considerable attention has also been accorded to the decline effect (the “law of initial results”), the apparent tendency of effects reported by investigators in initial studies of a given phenomenon to diminish or even disappear over time (Schooler, 2011). For example, some evidence suggests that the well‐known “bystander nonintervention” effect, whereby people are less likely to intervene in emergencies when others are present, has mysteriously shrunk in magnitude over the past few decades (Fischer et al., 2011). Recent findings similarly raise the possibility that the efficacy of cognitive behavioral therapy for major depressive disorder has been decreasing over time (Johnsen & Friborg, 2015). If the decline effect is pervasive in psychology, it would imply that many well‐accepted psychological findings and conclusions are likely to decay.

The Replication Crisis

It can be devilishly difficult to ascertain whether a statistically significant result is genuine. As a consequence, readers of the psychological literature are often left to wonder which results to trust and which to ignore. Perhaps the most crucial criterion for evaluating the robustness of psychological findings is replication, especially when conducted by independent investigative teams (Lindsay, 2015; Lykken, 1968). As the philosopher of science Sir Karl Popper (1959) observed, “non‐reproducible single occurrences are of no significance to science” (p. 66).

Over the past decade, numerous scholars have raised concerns regarding the replicability of psychological findings, with many referring to the present state of tumult as the “replication crisis” (Bartlett, 2014). Admittedly, the term “crisis” may be an overstatement, as the magnitude of the replicability problem across domains of psychology is largely unknown. At the same time, it has become increasingly evident that psychologists can no longer take the replicability of their findings for granted (Asendorpf et al., 2013).

Because they are rarely perceived as “exciting” or “sexy,” replication efforts have been greatly undervalued in most domains of psychology. Furthermore, many premier psychology journals have been loath to publish either successful or unsuccessful replications even though these studies should typically be accorded at least equal weight as the original investigations, and perhaps even more. Indeed, until relatively recently, some scholars doubted or even dismissed the importance of replication. For example, in the immediate wake of a controversial article by Bem (2011) purporting to uncover evidence for precognition, a form of extrasensory perception, the prominent journal that initially published the findings (the Journal of Personality and Social Psychology) initially refused to consider failed replications for potential publication (e.g., Ritchie, Wiseman, & French, 2012).

Fortunately, this attitude has begun to recede, and the Journal of Personality and Social Psychology has since abandoned its “no replication” policy in response to a flurry of criticism. Still, as psychologists have belatedly begun to take stock of the replicability of their findings, their efforts have met with an unwelcome surprise: Many of their studies are considerably less replicable than was initially assumed. For example, a widely publicized collaborative effort by the Open Science Collaboration (2015) to directly replicate 100 published findings in social and cognitive psychology revealed that only 39% of the studies were subjectively rated as having replicated the original results (but see Gilbert, King, Pettigrew, & Wilson, 2016, for an alternative view). As the authors themselves wisely noted, these replication failures do not necessarily imply that the original findings were false; moreover, as they observed, there is no single metric for gauging replication success (although in their analyses, replicability was relatively low, regardless of which metric they used). Nevertheless, these sobering findings suggest that psychologists and consumers of psychological research, including the news media, should place considerably less weight than they currently do on unreplicated findings (Waldman & Lilienfeld, 2015).

Controversies surrounding the insufficient attention accorded to replication have recently spilled over into lively and at times contentious blog discussions in social psychology, where widely cited research on the effects of priming on nonverbal behavior, such as that conducted by Yale psychologist John Bargh and others (see Bargh, Chen, & Burrows, 1996), has not withstood scrutiny by independent investigators (Doyen, Klein, Pichon, & Cleeremans, 2012). Although some psychological researchers have dismissed large‐scale replication efforts as possessing little or no scientific value (e.g., Mitchell, 2014), this defensive reaction is unwarranted. Only by ascertaining whether their findings survive multiple direct replication efforts can psychologists hope to ascertain their veracity (Simons, 2014).

The principal takeaway lesson from the recent debates is not that most psychological findings are unreplicable. Rather, it is that we need to fundamentally change the way we think about psychological data and results. Rather than conceptualizing each new study as a source of settled conclusions, we need to conceptualize its findings as merely one data point in a large population of potential studies, many or most of which have yet to be conducted (Waldman & Lilienfeld, 2015). We need to think meta‐analytically, even when we are not conducting formal meta‐analyses.

Another crucial bottom‐line lesson is that higher statistical power is necessary to boost the replicability of psychological science (Asendorpf et al., 2013; Tajika, Ogawa, Takeshima, Hayasaka, & Furukawa, 2015). Prominent statistically oriented psychologists have long lamented the low statistical power of most studies in their discipline (Cohen, 1962), but to little avail (Sedlmeier & Gigerenzer, 1989). Virtually all psychological researchers recognize that low statistical power is tied to a higher likelihood of false negative results. Unfortunately, many of these same researchers also erroneously assume that if a finding is statistically significant even with a small sample size, it is especially likely to be robust and replicable (indeed, we continue to hear this view espoused by a number of our academic colleagues). In fact, the opposite is true (Button et al., 2013; Walum, Waldman, & Young, 2015). Because of a statistical phenomenon known as winner’s curse, results from underpowered studies that manage to attain statistical significance are less likely to be genuine because their effects must be overestimated (i.e., positively biased) in order to achieve statistical significance. Moreover, even when genuine, their effects sizes are likely to be overestimated. The more underpowered the study, the greater the likelihood of such bias.

Psychology is hardly alone in its replicability challenges, as the Amgen episode we have already mentioned amply underscores. There is little evidence that replicability is substantially lower in subfields of psychology than it is in other scientific domains, including particle physics (Hedges, 1987; cf. Hartsfield, 2015). This point appears to have been lost on a number of observers. For example, in response to the publication of the recent Open Science Collaboration (2015) replicability findings, a recent president of the American Psychiatric Association and influential author, Jeffrey Lieberman, tweeted that psychology is “in shambles” (see McKay & Lilienfeld, 2015). Ironically, soon after this tweet appeared, an article reporting comparable replicability problems in psychiatry appeared in print (Tajika et al., 2015). In their review, the authors examined 83 widely cited articles in psychiatry journals that had reported results for specific interventions. Of the studies reported therein, 40 had never been subjected to replication attempts, 11 were contradicted by later findings, and 16 reported substantially smaller effect sizes than in the original study; only 16 of the original studies were successfully replicated. Clearly, replicability is a concern for science at large, not merely psychological science.

Other Challenges to Psychological Science

The challenges to psychological science do not end there. A growing cadre of scholars has argued that the “file drawer problem,” the tendency of negative studies to remain selectively unpublished (Rosenthal, 1979), poses a serious threat to the integrity of conclusions in psychology and other sciences (Franco, Malhotra, & Simonovits, 2014). Such publication bias may exacerbate the problem of false positives generated by HARKing, p‐hacking, and other problematic research practices. Although a host of helpful statistical approaches, such as funnel plots of effect sizes (Duval & Tweedie, 2000), exist for estimating the impact of publication biases on psychological conclusions, none is free of limitations. To address the file drawer problem and other forms of publication bias (e.g., outcome reporting bias), a number of researchers have proposed that the raw data from published psychological studies be placed in publicly available registries for re‐analyses by independent scholars (e.g., Asendorpf et al., 2013; Ioannidis, Munafo, Fusar‐Poli, Nosek, & David, 2014). Many of these researchers have further suggested that investigators’ hypotheses be pre‐registered, thereby minimizing the likelihood of outcome reporting bias. Nevertheless, these proposed remedies have met with vocal opposition in some quarters.

In other cases, critics have contended that psychological researchers frequently neglect to account for the a priori plausibility of their theories when appraising their likelihood. According to these critics, investigations in certain domains, such as parapsychology (the study of extrasensory perception and related paranormal phenomena), should be held to much higher evidentiary standards than those in other fields, because the claims advanced by researchers in the former fields run strongly counter to well‐established scientific conclusions. Many of these critics have lobbied for a heightened emphasis on “Bayesian” approaches to data analysis, which consider the initial scientific plausibility of findings when evaluating their probability (Wagenmakers, Borsboom, Wetzel, & van der Maas, 2011).

In addition, over the past decade or so, a growing chorus of scholars has insisted that many well‐accepted psychological and psychiatric findings, such as those concerning stereotype threat, implicit prejudice, unconscious priming, psychopharmacology (e.g., the efficacy of antidepressant medications relative to placebos), and psychotherapy outcome research, have been substantially overhyped. For example, in the domain of psychotherapy research, some meta‐analyses suggest that a substantial proportion of the variability in client outcomes is attributable to allegiance effects – that is, the extent to which investigators conducting the studies are partial to the intervention in question (Luborsky et al., 1999).

Finally, over the past few years, several high‐profile examples of definitive or probable data fabrication, falsification, and other questionable research practices (e.g., presenting exploratory analyses as confirmatory, omitting mention of relevant dependent variables that yielded nonsignificant findings) have raised troubling questions concerning psychology’s capacity to police itself (John, Lowenstein, & Prelec, 2012). More recent evidence points to a nontrivial prevalence of statistical reporting errors in major psychological journals (Nuijten, Hartgerink, van Assen, Epskamp, & Wicherts, 2015). Perhaps not surprisingly, the distribution of these errors appears to be nonrandom, and conveniently tends to favor the authors’ hypotheses.

Media Attention

Many of the recent challenges to psychological science have begun to catch the public eye. In a widely discussed 2011 article in the New York Times (“Fraud Case Seen as Red Flag for Psychology Research”), science journalist Benedict Carey discussed the shocking case of eminent Dutch social psychologist and journal editor Diederick Stapel, much of whose research was discovered to have been blatantly and brazenly fabricated. Carey (2011) wrote that:

Experts say the case exposes deep flaws in the way science is done in a field, psychology, that has only recently earned a fragile respectability. … The scandal, involving about a decade of work, is the latest in a string of embarrassments in a field that critics and statisticians say badly needs to overhaul how it treats research results. In recent years, psychologists have reported a raft of findings on race biases, brain imaging and even extrasensory perception that have not stood up to scrutiny.

Only the year before, now‐disgraced journalist Jonah Lehrer penned a provocative piece in The New Yorker magazine on the decline effect entitled “The Truth Wears Off: Is There Something Wrong with the Scientific Method?” In this article, Lehrer (2010) highlighted the baffling tendency of findings in psychology and several other scientific fields to shrink in size over time. According to Lehrer, the decline effect may point to a fundamental flaw in how researchers in psychology and allied disciplines analyze and interpret data.

Other articles questioning the standard methods of psychology (in such subdisciplines as neuroimaging and parapsychology) have recently appeared in such high‐profile outlets as the Atlantic, Newsweek, Scientific American, and Seed magazines. In 2015, the results of the Open Science Collaboration, which as noted earlier revealed that the lion’s share of 100 published studies on social and cognitive psychology did not survive attempts at direct replication, received prominent coverage in the New York Times, Wall Street Journal, and other major venues. Finally, controversies regarding the potential overhyping of psychiatric medications, stemming in part from the alleged suppression of negative findings by drug companies, have received national coverage on 60 Minutes and other media outlets. Needless to say, this media coverage has not only given psychology and psychiatry something of a “black eye” in the view of much of the general public (Ferguson, 2015), but has led many scholars inside and outside of psychology to ask whether the status quo methodologies of our field are ripe for reexamination.

Scrutiny of Psychological Science as Healthy Self‐Criticism

In contrast to many outspoken critics, we regard many or most of the recent questions raised about the modal methodological approaches of psychological science as signs of the health of our discipline. Some readers may find this statement to be surprising. Yet, in many respects, the recent scrutiny accorded to psychological science by psychological scientists themselves exemplifies science working precisely as it should – subjecting claims to intense criticism in a concerted effort to winnow out errors in one’s web of beliefs (O’Donohue, Lilienfeld, & Fowler, 2007). Far from psychology being in shambles, our field is leading the way to improving not only the conduct of psychological science, but of science itself.

We suspect that some of the recent criticisms directed at psychological science will turn out to possess substantial merit, whereas others may not. Even so, many of these criticisms have posed important challenges to the status quo in our field, and raised thoughtful questions concerning long‐held assumptions about how best to design studies, perform statistical analyses, and interpret findings (Ferguson, 2015). They have also underscored the field’s insufficient emphasis on systematic safeguards against spurious findings and independent replication, and its frequent overemphasis on results that are surprising, splashy, or “sexy.”

Indeed, we are unreserved optimists regarding the future of psychological science. We view these recent criticisms not as threats, but rather as opportunities to identify and minimize heretofore underappreciated sources of error in psychology’s findings and conclusions. Indeed, to the extent that these criticisms can point psychologists, psychiatrists, and others toward novel methodologies to root out and eliminate these sources of error, they may ultimately prove to be psychological science’s best allies.

This Book

In the spirit of subjecting claims to rigorous scrutiny, in this edited book we “turn the tables” and place psychological science itself under the microscope. In this book, we explore a variety of recent challenges to the standard methodologies and assumptions of psychological science. Just as important, we examine the advantages and disadvantages of proposed remedies for these challenges. In this way, we intend to stimulate constructive debate regarding how to enhance the trustworthiness of psychology’s findings and conclusions, and ultimately to make psychology more firmly grounded in solid science.

The chapters are authored by internationally recognized experts in their fields, and written with a minimum of specialized terminology. In addition, each chapter lays out not only the pertinent challenges posed to psychological science, but proposed solutions and, when relevant, constructive suggestions for future research.

This book should be of considerable interest to researchers, teachers, advanced undergraduates, and graduate students across all domains of psychology, as well as to those in allied fields, such as psychiatry, psychiatric nursing, counseling, and social work. In addition, this book should be relevant to investigators in sociology, anthropology, neuroscience, medicine, public health, and epidemiology, all of which rely on at least some of the same methodologies as psychological researchers. Finally, this book should appeal to instructors who teach graduate and advanced undergraduate courses in psychological research methods and statistics, some of whom may elect to adopt the book as a supplemental text.

Format and Table of Contents

This book is divided into three major sections: (I) Cross‐Cutting Challenges to Psychological Science; (II) Domain‐Specific Challenges to Psychological Science; and (III) Psychological and Institutional Obstacles to High‐Quality Psychological Science. To set the stage for what is to come, we summarize these three sections, as well as the chapters within each section, in the text that follows.

Section I, Cross‐Cutting Challenges to Psychological Science, focuses on sweeping challenges to psychological science that cut across most or all subdisciplines, such as the problems posed by false positive findings, insufficient emphasis on replication, the decline effect, and the neglect of Bayesian considerations in data evaluation.

In

Chapter 1

, Brian Nosek and his colleagues at the Open Science Collaboration survey the landscape of the replicability challenges confronting psychology, and present a number of potential remedies for enhancing the reproducibility of psychological research. Among other suggestions, they underscore the value of pre‐registering studies and study designs, performing confirmatory analyses prior to moving onto exploratory analyses, and sharing all study materials with one’s collaborators and the broader scientific community.

In

Chapter 2

, Katherine S. Button and Marcus R. Munafò discuss the problem of low statistical power in psychological research and offer a user‐friendly tutorial on its impact on the detection of genuine effects. They discuss several potential reasons for the persistence of underpowered psychological research and examine several solutions to this lingering problem, including a heightened emphasis on large‐scale collaborative research and online data collection.

In

Chapter 3

, Christopher J. Ferguson and Moritz Heene argue that contemporary psychological research is marked by an “an aversion to the null hypothesis,” that is, a reluctance to publish negative findings. They contend that this bias has resulted in a distorted picture of the magnitudes of effects across many psychological domains. The solutions, they maintain, include a greater willingness to publish replications, including unsuccessful replications, and a sea change in the academic culture, in which negative findings and conclusions are accorded greater value.

Most of our discussion thus far has focused on false positives. Yet, as Klaus Fiedler and Malte Schott observe in

Chapter 4

, false negatives also pose a serious – and often insufficiently appreciated – challenge to psychological research. The authors discuss the statistical sources of false negative results and consider the relative costs of false positive and false negative findings in psychological research. They point out that although false positive results can often be ferreted out by means of subsequent unsuccessful replication efforts, false negative results may lead to fruitful lines of research being prematurely abandoned. Hence, in some domains of basic psychological science, they may be even more detrimental than false positives.

In

Chapter 5

, Etienne P. LeBel and Leslie K. John address the problems posed by the lack of transparency in the reporting of psychological research. As they note, the veil of secrecy of which authors often avail themselves can allow them to engage with impunity in a host of questionable practices that can boost the odds of false positive results. The authors offer a number of suggestions for increasing transparency in data reporting, including public platforms for disclosing data and data analyses, and changes in journal editorial policies.

Chapter 6

, by John Protzko and Jonathan W. Schooler, explores the controversial topic of decline effects – the apparent decrease in the magnitude of effect sizes across numerous psychological domains. The authors present a novel taxonomy of decline effects and evaluate several potential reasons for the emergence of such effects, including regression to the mean and publication bias. As the authors point out, decline effects further underscore the importance of replication and meta‐analyses as tools for winnowing out genuine from artifactual findings.

Chapter 7

, authored by Joachim I. Krueger, examines the pervasive problem of reverse inference and its impact on the appraisal of psychological hypotheses and theories. Reverse inference occurs whenever we are reasoning backward from a behavior, thought, or emotion to a psychological state or trait. Psychologists engage in reverse inference whenever they posit a psychological state (e.g., fear) on the basis of activation of a specific brain region (e.g., amygdala); they also do so whenever they attempt to infer a psychological trait (e.g., extraversion) on the basis of endorsements of self‐report items (e.g., “I enjoy going to parties”). As Krueger notes, reverse inferences are more ubiquitous in psychology than most people assume, and they come with unappreciated interpretative challenges.

Chapter 8

, co‐authored by Eric‐Jan Wagenmakers, Josine Verhagen, Alexander Ly, Dora Matzke, Helen Steingroever, Jeff N. Rouder, and Richard Morey, raises questions regarding one of the sacred cows of psychological research: statistical significance testing. The authors contend that this commonly accepted approach is far too lenient, as it does not account for the

a priori

likelihood of hypotheses. They maintain that a Bayesian approach, although introducing an inherent level of subjectivity that many psychologists resist, provides a much better alternative to the standard appraisal of theories.

Section II, Domain‐Specific Challenges to Psychological Science, focuses on challenges to psychological science that are specific to certain subdisciplines, such as functional neuroimaging, candidate gene studies of psychopathology, the efficacy of antidepressant medication, and parapsychological research.

In

Chapter 9

, Anthony R. Pratkanis delineates what he terms the “partial, but real crisis in social psychology,” arguing that such distorting influences as intrinsic human biases, careerism, and the mounting pressures on researchers to generate “sexy” findings have damaged the scientific credibility of some subfields of contemporary social psychology. Pratkanis proposes a number of remedies to combat the recent ills afflicting social psychology, such as a focus on condition‐seeking and a need to reform modal practices at psychological journals and granting agencies. He concludes by reminding us of Nobel Prize–winning physicist Richard Feynman’s maxim that the essence of science is bending over backward to prove ourselves wrong.

In

Chapter 10

, Gregory Mitchell and Philip E. Tetlock take on one of the sacred cows of modern social psychology: implicit prejudice. They contend that much of the recent fascination with the conceptualization and measurement of implicit prejudice exemplifies the tendency of psychologically plausible claims to acquire a powerful foothold even in the absence of extensive supportive data. We suspect that not all readers will agree with Mitchell and Tetlock’s conclusions, but also suspect that all readers will find their discussion to be provocative and enlightening.

In

Chapter 11

, Edward Vul and Harold Pashler expand on a now famous – some might say infamous – 2009 article in

Perspectives on Psychological Science

(by Vul, Harris, Winkielman, & Pashler; cited over 970 times as of this writing), which described the subtle methodological errors that can lead investigators to obtain remarkably high correlations (often above

r

 = 0.90) between psychological states and traits, on the one hand, and brain activations, on the other. They provide readers with helpful suggestions for avoiding the “non‐independence problem” they identified, and highlight the importance of increasing statistical power and focusing more squarely on replication. As Vul and Pashler wisely observe, these methodological pitfalls are not unique to brain imaging research, and appear in slightly different guises in psychological literature, including personality assessment research.

Chapter 12

, authored by Elizabeth Prom‐Wormley, Amy Adkins, Irwin D. Waldman, and Danielle Dick, examines some of the reasons why genetic association studies, such as those in the domain of psychopathology, have often proven difficult to replicate. They distinguish the hyperbole that has often characterized these studies in the past from a more realistic contemporary appraisal, and offer a host of methodological desiderata for researchers and consumers of the literature. They conclude by discussing the promises and perils of widely hyped studies of gene‐by‐environment interaction.

In

Chapter 13

, Brett J. Deacon and Glen I. Spielmans address the contentious question of whether the efficacy of antidepressants and other psychotropic medications has been exaggerated. They contend that publication and outcome reporting biases, fueled by drug industry interests, have conspired to produce substantial overestimates of the efficacy of these medications, antidepressants in particular. Their chapter is a useful cautionary tale about the perils of seeking confirmation rather than refutation in applied science.

Chapter 14

, by Ray Hyman, examines the numerous pitfalls that have afflicted the field of parapsychology, and discusses the quixotic search for “psi,” a broad spectrum of paranormal phenomena that encompasses extrasensory perception and psychokinesis. As Hyman observes, Bem’s (2011) widely ballyhooed article on precognition is only the most recent high‐profile attempt to provide laboratory evidence for psi. Yet, as Hyman notes, all of these efforts have failed, despite more than 150 years of dedicated research. In addition, they have yielded a flurry of unreplicated positive findings, largely owing to an overreliance on statistical significance testing and the repeated insinuation of methodological flaws. Hyman’s chapter should be enlightening even for readers without an interest in parapsychology

per se

, as it points to a host of subtle methodological flaws that can afflict most or all psychological research.

Section III, Psychological and Institutional Obstacles to High‐Quality Psychological Science, focuses on psychological, sociological, and institutional obstacles that impede the progress of psychological science, including confirmation bias and preferences for “faddish” psychological questions.

One of the foremost psychological impediments standing in the way of scientific progress, including progress in psychology, is confirmation bias, also termed “confirmatory bias.” This bias refers to a pervasive propensity to seek out and interpret evidence consistent with one’s hypotheses and to neglect or selectively reinterpret evidence that is not (Nickerson, 1998). In

Chapter 15

, Robert J. MacCoun and Nobel Laureate Saul Perlmutter address the problem of confirmatory bias in psychology and allied fields. They introduce a novel technique, blind analysis, which has already been used to good effect in some domains of physics (see also MacCoun & Perlmutter, 2015), for combatting the insidious impact of confirmatory bias. Psychologists would be well advised to heed their methodological suggestions.

In

Chapter 16