Evidence in Medicine - Iain K. Crombie - E-Book

Evidence in Medicine E-Book

Iain K. Crombie

0,0
37,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

High-quality evidence is the foundation for effective treatment in medicine. As the vast amount of published medical evidence continues to grow, concerns about the quality of many studies are increasing. Evidence in Medicine is a much-needed resource that addresses the ‘medical misinformation mess’ by assessing the flaws in the research environment. This authoritative text identifies and summarises the many factors that have produced the current problems in medical research, including bias in randomised controlled trials, questionable research practices, falsified data, manipulated findings, and more. 

This volume brings together the findings from meta-research studies and systematic reviews to explore the quality of clinical trials and other medical research, explaining the character and consequences of poor-quality medical evidence using clear language and a wealth of supporting references. The text suggests planning strategies to transform the research process and provides an extensive list of the actions that could be taken by researchers, regulators, and other key stakeholders to address defects in medical evidence. This timely volume: 

  • Enables readers to select reliable studies and recognise misleading research 
  • Highlights the main types of biased and wasted studies 
  • Discusses how incentives in the research environment influence the quality of evidence 
  • Identifies the problems researchers need to guard against in their work 
  • Describes the scale of poor-quality research and explores why the problems are widespread 
  • Includes a summary of key findings on poor-quality research and a listing of proposed initiatives to improve research evidence 
  • Contains extensive citations to references, reviews, commentaries, and landmark studies 

Evidence in Medicine is required reading for all researchers who create evidence, funders and publishers of medical research, students who conduct their own research studies, and healthcare practitioners wanting to deliver high-quality, evidence-based care. 

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 402

Veröffentlichungsjahr: 2021

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

Title Page

Copyright Page

Preface

Aims of this Book

REFERENCES

CHAPTER 1: The Rationale for Treatment

THEORY AS JUSTIFICATION FOR TREATMENT

TESTING ON A SERIES OF PATIENTS

COMPARING GROUPS

COMPARING SIMILAR GROUPS

CASTING LOTS AND TREATMENT ALLOCATION

RANDOM NUMBERS FOR TREATMENT ALLOCATION

THE NEED FOR BLINDING

CONCLUSION

REFERENCES

CHAPTER 2: Sources of Bias in Randomised Controlled Trials

METHOD OF TREATMENT ALLOCATION

PROBLEMS IN MEASURING THE OUTCOME

FOLLOW‐UP AND MISSING OUTCOMES

MISSING OUTCOME DATA AND INTENTION TO TREAT

OTHER METHODOLOGICAL CONCERNS

CONCLUSIONS

REFERENCES

CHAPTER 3: Wasted and Unhelpful Trials

WASTED STUDIES

NEGLECTED AREAS OF RESEARCH

UNHELPFUL OUTCOME MEASURES

LACK OF GENERALISABILITY

WEAK AND MISLEADING EVIDENCE

CONCLUSION

REFERENCES

CHAPTER 4: Can the Analysis Bias the Findings?

THE P‐VALUE PROBLEM

QUESTIONABLE RESEARCH PRACTICES

ENSURING HIGH QUALITY ANALYSIS: THE STATISTICAL ANALYSIS PLAN

CONCLUSIONS

REFERENCES

CHAPTER 5: Systematic Reviews and Meta‐Analysis

INTRODUCTION

IDENTIFYING RELEVANT TRIALS

EXTRACTING TRIAL DATA

THE QUALITY OF PRIMARY TRIALS

POOLING EFFECT SIZES ACROSS TRIALS

OTHER METHODOLOGICAL ISSUES

CONCLUSIONS

REFERENCES

CHAPTER 6: Fabrication, Falsification and Spin

FABRICATION

FALSIFICATION

QUESTIONABLE RESEARCH PRACTICES

SPIN

Drawing Misleading Conclusions

RETRACTIONS

DISCUSSION

REFERENCES

CHAPTER 7: Why Do Researchers Falsify Data or Manipulate Study Findings?

THE RESEARCH ENVIRONMENT

RESEARCH OVERSIGHT

CONFLICT OF INTEREST

INDIVIDUAL LEVEL EXPLANATIONS FOR RESEARCH MISCONDUCT

HOW HONEST PEOPLE RATIONALISE MISCONDUCT

DISCUSSION

REFERENCES

CHAPTER 8: Developing a Strategy to Prevent Poor Quality and Misleading Research

RESEARCH ENVIRONMENT

RESEARCH TRANSPARENCY

RESEARCH OVERSIGHT

RESEARCH INTEGRITY

ESSENTIAL ELEMENTS OF A TRANSFORMATIONAL STRATEGY

IMPLEMENTING A PROGRAMME FOR ACTION

REFERENCES

Appendix 1: Summary of the Key Findings on Poor Quality Research

PROBLEMS IN THE DESIGN, CONDUCT, ANALYSIS AND REPORTING OF STUDIES

Statistical Analysis

FREQUENCY OF DATA FABRICATION AND FALSIFICATION

THE CAUSES OF POOR QUALITY AND MISLEADINGRESEARCH

THE FINDINGS IN PERSPECTIVE

REFERENCES

Appendix 2: Initiatives to Improve the Quality of Research

CHANGE THE RESEARCH ENVIRONMENT

INCREASE RESEARCH TRANSPARENCY

QUALITY OF TRIAL METHODOLOGY

TRIAL REGISTRATION

REPORTING OF THE METHODS OF SYSTEMATIC REVIEWS

INCREASING ACCESS TO AND USE OF REPORTING GUIDELINES

IMPLEMENT VIGOROUS RESEARCH OVERSIGHT

PROMOTE RESEARCH INTEGRITY

EXAMPLES OF COORDINATED INITIATIVES

REFERENCES

Index

End User License Agreement

List of Tables

Chapter 6

TABLE 6.1 Frequency of questionable research practices.

Guide

Cover Page

Title Page

Copyright Page

Preface

Aims of this Book

Table of Contents

Begin Reading

Appendix 1: Summary of the Key Findings on Poor Quality Research

Appendix 2: Initiatives to Improve the Quality of Research

Index

Wiley End User License Agreement

Pages

iii

iv

ix

x

xi

xii

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

238

239

240

241

242

243

244

245

246

247

248

249

250

Evidence in Medicine

The Common Flaws, Why They Occur and How to Prevent Them

Iain K Crombie

University of DundeeDundee, Scotland, UK

This edition first published 2021© 2021 John Wiley & Sons Ltd

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.

The right of Iain K Crombie to be identified as the author of this work has been asserted in accordance with law.

Registered Office(s)John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USAJohn Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK

Editorial Office9600 Garsington Road, Oxford, OX4 2DQ, UK

For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.

Wiley also publishes its books in a variety of electronic formats and by print‐on‐demand. Some content that appears in standard print versions of this book may not be available in other formats.

Limit of Liability/Disclaimer of WarrantyThe contents of this work are intended to further general scientific research, understanding and discussion only and are not intended and should not be relied upon as recommending or promoting scientific method, diagnosis or treatment by physicians for any particular patient. In view of ongoing research, equipment modifications, changes in governmental regulations and the constant flow of information relating to the use of medicines, equipment and devices, the reader is urged to review and evaluate the information provided in the package insert or instructions for each medicine, equipment or device for, among other things, any changes in the instructions or indication of usage and for added warnings and precautions. While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organisation, website or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organisation, website or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential or other damages.

Library of Congress Cataloging‐in‐Publication Data

Names: Crombie, I. K., author.Title: Evidence in medicine : the common flaws, why they occur and how to prevent them / Iain K. Crombie.Description: First edition. | Hoboken, NJ : John Wiley & Sons, Inc., 2021. | Includes bibliographical references and index.Identifiers: LCCN 2020051750 (print) | LCCN 2020051751 (ebook) | ISBN 9781119794141 (paperback) | ISBN 9781119794189 (adobe pdf) | ISBN 9781119794196 (epub)Subjects: MESH: Research Design–standards | Randomized Controlled Trials as Topic | Systematic Reviews as Topic | Evidence‐Based Practice–standards | Treatment OutcomeClassification: LCC R852 (print) | LCC R852 (ebook) | NLM W 20.5 | DDC 610.72–dc23LC record available at https://lccn.loc.gov/2020051750LC ebook record available at https://lccn.loc.gov/2020051751

Cover Design: WileyCover Image: © Govindanmarudhai/DigitalVision Vectors/Getty Images

Preface

Evidence is central to the practice of medicine. The amount of published medical evidence is immense, but there are widespread concerns about the quality of many studies. Deficiencies in the conduct of research studies often result in misleading estimates of the benefits of treatments, so that ineffective treatments may be used in clinical practice. In addition, many studies are wasted because they have been so poorly designed and conducted.

This book explores the nature of the deficiencies and flaws in the evidence about the effectiveness of treatments. It is the result of a career‐long interest in medical evidence. A recent career change provided an opportunity to read the extensive literature on research quality that has emerged in recent years. As befits a book on evidence, references are cited to support the statements made. Reviews and commentaries are used where available, although many landmark studies are also referenced. The approach taken is to cite sufficient papers to support a point, rather than give a comprehensive a review of it. The number of references cited reflect the wealth of evidence on the deficiencies in medical research.

I am grateful to the University of Dundee for providing the facilities to research and write this book, and to my colleagues for their support and encouragement. Special thanks are due to my long‐time colleague and friend Linda Irvine: she made several gently delivered, trenchant criticisms of early drafts and picked out factual, logical and grammatical failings in later drafts. The irony of writing a book about evidence is that it may contain errors of fact or interpretation. Equally there may be errors of omission. I take full responsibility for all flaws.

Aims of this Book

The advent of the randomised controlled trial (RCT) provided a method to generate reliable evidence on the effectiveness of treatments. It enables a fair comparison of groups [1], and should identify which treatments are beneficial and which are of little or no value. Since the early randomised controlled trials (RCTs) of the mid‐twentieth century, the design of trials has been progressively refined, making them the bedrock on which modern medicine is built. The findings of RCTs are used by regulatory authorities across the world to licence effective treatments. Other methods of testing treatments have been proposed, but none matches the ability of high quality randomised trials to provide good estimates of treatment effectiveness [2]. The RCT deserves its status as the gold standard for assessing the effectiveness of treatment.

The RCT now occupies a leading place in medical research, with tens of thousands of trials being published annually [3, 4]. These studies should provide high quality evidence across all fields of medicine, but that promise has not been fulfilled. At issue is the quality of the evidence. Concerns are growing ‘about the reliability and validity of the underlying research that supports regulatory and clinical decision‐making’ [5], with authors describing ‘the pervasiveness of poor quality clinical evidence’ [6] and concluding that ‘much of the published medical research is apparently flawed, cannot be replicated and/or has limited or no utility’ [7]. In a particularly trenchant comment, Ian Roberts and colleagues concluded that ‘the knowledge system underpinning healthcare is not fit for purpose and must change’ [8]. This book evaluates that proposition.

To investigate the knowledge system for healthcare, this book poses five questions: 1) what are the problems; 2) how common are they; 3) to what extent do they bias evidence; 4) why do they occur, and 5) how can they be prevented? Medical research is a vast enterprise, so the book focuses on the two most important research methods: the randomised controlled trial and the systematic review. Answers to the first three questions will provide a broad assessment of the quality of evidence (Chapters 2–5). The answer to the fourth question, on the causes of poor quality, highlights issues around misconduct and how the structures and incentives in the research environment influence the quality of evidence (Chapters 6 and 7). The final chapter presents an approach for developing a comprehensive strategy to the quality problem. This is supported by an Appendix, which lists the initiatives that have been proposed to improve research evidence. First, to introduce the nature of evidence in medicine, Chapter 1 provides a brief review of the rationale for treatments from ancient times to the present day.

This book focuses on the flaws in medical evidence. It does not review the important findings from the many high quality studies that have provided convincing evidence on the effectiveness of a wide range of treatments. Instead, the book describes the variety of deficiencies that afflict much research. This assessment could lead to an overly negative assessment of the state of medical evidence. The reality is that clinical research is spread across a spectrum from high to very low quality studies. The aim of the book is to use a detailed review of the flaws and their causes, to develop a strategy to prevent poor quality and misleading research. If implemented, the strategy could prevent some of the more egregious studies from being published, and rectify some with less reprehensible weaknesses. In this way, it could help move the spectrum upwards in quality. The book should be read from a constructive perspective of what is going wrong and how can we change it, in which the focus on flaws also provides the motivation for action.

REFERENCES

1. Chalmers, I. (2011). Why the 1948 MRC trial of streptomycin used treatment allocation based on random numbers.

J. R. Soc. Med.

104: 383–386.

2. Byar, D.P., Simon, R.M., Friedewald, W.T. et al. (1976). Randomized clinical trials. Perspectives on some recent ideas.

N. Engl. J. Med.

295: 74–80.

3. Bastian, H., Glasziou, P., and Chalmers, I. (2010). Seventy‐five trials and eleven systematic reviews a day: how will we ever keep up?

PLoS Med.

https://doi.org/10.1371/journal.pmed.1000326

.

4. Viergever, R.F. and Li, K. (2015). Trends in global clinical trial registration: an analysis of numbers of registered clinical trials in different parts of the world from 2004 to 2013.

BMJ Open

https://doi.org/10.1136/bmjopen‐2015‐008932

.

5. Wallach, J.D., Gonsalves, G.S., and Ross, J.S. (2018). Research, regulatory, and clinical decision‐making: the importance of scientific integrity.

J. Clin. Epidemiol.

93: 88–93.

6. Ioannidis, J.P.A., Stuart, M.E., Brownlee, S. et al. (2017). How to survive the medical misinformation mess.

Eur. J. Clin. Investig.

47: 795–802.

7. Eshre, C.W.G. (2018). Protect us from poor‐quality medical research.

Hum. Reprod.

33: 770–776.

8. Roberts, I., Ker, K., Edwards, P. et al. (2015). The knowledge system underpinning healthcare is not fit for purpose and must change.

BMJ

https://doi.org/10.1136/bmj.h2463

.

CHAPTER 1The Rationale for Treatment: A Brief History

The development of modern medicine has rightly been described as ‘the greatest benefit to mankind’ [1]. Vaccination, anaesthetics, aseptic surgery, antibiotics, insulin for diabetes and drugs to prevent and treat heart disease form part of a very long list of treatments that have transformed healthcare. These are the fruits of many years of careful clinical investigation supported by extensive research. However, medicine has a checkered history in which ineffective treatments were widely used: as Benjamin Franklin pithily remarked in the eighteenth century, ‘God heals, and the doctor takes the fees’ [2]. Studies of the history of medicine show that the basis on which treatments have been used has varied greatly across the centuries [3]. This chapter explores the rationale behind the use of treatments, and the way this has changed over time. It concludes by describing the progress towards present‐day methods for testing the effectiveness of treatments.

THEORY AS JUSTIFICATION FOR TREATMENT

In ancient times diseases were attributed to supernatural causes, spirits and demons. Treatments involved spells and prayers, or the wearing of amulets, which were intended to drive the malign forces from the patient [4]. Theories gradually evolved towards biological and physical causes of disease, with treatments involving minor surgery and drugs (usually based on plant extracts, minerals and metals). In Western medicine, one of the most influential of these theories was the doctrine of the four humours. It held that good health was enjoyed when four humours (the fluids: blood, phlegm, black bile and yellow bile) were in balance, with an excess of one humour causing disease [5]. Treatment for illness focused on restoring the balance by removing some of the excess humour from the body. This could be achieved by bloodletting (cutting open a vein or by applying a leech), or by losing fluid with a purgative or blistering the skin. This treatment was almost always harmful, although it often appeared to give short‐term relief of the symptoms of acute inflammations [6]. The most notable casualty of bloodletting was George Washington, first president of the United States. He was suffering from a serious upper respiratory tract infection, for which his doctors extracted approximately 2.4 L of blood over about 12 hours. He died 33 hours later, probably from the combination of the infection and the treatment given [5]. When the practice of bloodletting was challenged in the nineteenth century, a leading physician, William Stokes, commented that it was hard to believe ‘that the fathers of British medicine were always in error, and that they were bad observers and mistaken practitioners’ [7]. This cautionary tale of bloodletting suggests that theory and clinical experience may be unreliable guides to the effectiveness of a treatment.

A more recent but widely (mis)used theory was that bed rest was beneficial for a variety of ailments. Its popularity has been traced to a series of lectures in the middle nineteenth century by John Hilton, president of the Royal College of Surgeons [8, 9]. Initially recommended for recovery following orthopaedic procedures [10], it was soon used for conditions including myocardial infarction, pulmonary tuberculosis, rheumatic fever and psychiatric illnesses [9]. Bed rest was particularly popular in pregnancy, where it was recommended for complications such as threatened abortion, hypertension or preterm labour [11]. The theory was that if rest helped to mend broken bones, then it would also heal other organs [9]. The benefits of bed rest were thought to include reduced demands on the heart, conservation of metabolic resources for healing and avoidance of stress [12]. Its use began to be challenged in the middle of the twentieth century, as evidence grew on the adverse effects of bed rest; it is now known to cause impairment of cardiovascular, haematological, musculoskeletal, immune and psychological functions [9, 12]. Bed rest is an example of a treatment based on beliefs about benefit that endured in the face of substantial evidence of harm [8, 11].

TESTING ON A SERIES OF PATIENTS

The transition, from treatments based on theory to the use of evidence derived from empirical studies, was a gradual process. A simple, and common, method was to give a treatment on a series of patients, then observe its impact on disease. A good example is the use of the leaves of the willow tree for inflamed joints, a treatment dating back to the ancient Egyptians [13]. Clinical observation confirmed the benefits: application of a decoction of willow leaves to inflamed skin reduced the swelling. Extracts of willow leaves and bark were also used for fever and pain by the Greeks from the fifth century BCE [14]. An important step in the use of the willow was taken by the Reverend Edward Stone in 1763. He administered a solution of powdered willow bark to 50 patients with fever, judging the treatment a great success [14, 15]. The active ingredient of the willow, salicin, was isolated in the 1820s [13, 15]. This drug was tested by a Dundee physician, T.J. MacLagan, who administered it to a series of patients with acute rheumatism. Not only was the treatment successful, it demonstrated antipyretic, analgesic and anti‐inflammatory effects [15]. Salicin was recognised to be an important drug, but its long‐term use was limited because gastric irritation, nausea and vomiting were common side effects. The pharmaceutical arm of the Bayer company searched for a safer alternative, and successfully modified salicin to produce a new chemical with fewer side effects [13, 15]. That drug, aspirin, is now the most widely used medicine in the world [14].

Another example of evidence from a series of patients is the discovery of insulin for the treatment of diabetes. This was undoubtedly ‘one of the most dramatic events in the history of the treatment of disease’ [16]. Research, in the late nineteenth century, had shown that removal of an animal's pancreas ‘produced severe and fatal diabetes’ [17]. Over the following 30 years many researchers tried to isolate a pancreatic extract that could control blood sugar levels. They had little success, as the extracts had only a transitory effect on blood sugar and caused unacceptable side effects (vomiting, fever and convulsions) [18, 19]. In October 1920 Frederick Banting, a young Canadian doctor, was preparing a lecture on the pancreas [16]. The research he was reading led him to think that the active ingredient was being destroyed by the digestive enzymes in the pancreas, and that this could be prevented by ligating the pancreatic ducts. Banting began the experiments with extracts of the ligated pancreas in May 1921 [17]. By January 1922 a purified extract had been obtained. This proved successful in treating a 14‐year‐old boy, and in February a further six patients were treated with equally favourable results [16]. The discovery was announced in April to international acclaim; the Nobel prize was awarded to Banting, and one of his colleagues, Dr Macleod, in 1923 [16].

COMPARING GROUPS

Case series can provide support for a treatment if, as with insulin, the benefits are immediate and substantial. But observations on a set of patients are often not sufficient to identify whether a treatment is truly effective. Consider the management of gunshot wounds in the sixteenth century. At that time it was believed that the bullet introduced poison into the body, and that cauterising the wound with boiling oil mixed with treacle would detoxify it [20, 21]. The treatment was very unpleasant, but was thought to save lives. Force of circumstances led the French barber‐surgeon, Ambroise Paré, to use a different treatment. During the Italian war of 1536–1538, Paré ran out of oil and instead used a balm of egg yolk, rose oil and turpentine [20]. He observed that the outcomes differed substantially between the two groups: those treated with the hot oil were feverish and in ‘great pain and swelling about the edges of their wounds’, whereas those given the balm were resting comfortably [21]. Further trials of the balm convinced Paré that gunshot wounds were not poisoned and should not be cauterised [20].

The comparison of groups also helped promote a technique for the prevention of smallpox. In the 1700s smallpox was a leading cause of death, with many of those who survived suffering disfigurement and blindness [22]. The available preventive measure was to infect children with puss or scab material from smallpox victims, a process known as variolation. Despite reports that it was beneficial [23], there was widespread concern that variolation might carry a greater risk of dying than allowing people to contract the disease naturally. James Jurin evaluated this in the 1720s, by collecting data on death rates in three groups: those who were diagnosed with smallpox, those at risk of contracting smallpox and those who had been variolated [22, 23]. The results appeared convincing with death rates of 16.5% (diagnosed cases), 8.3% (at risk) and 2.0% (variolated) [23]. Preventing smallpox was a much safer practice than letting nature take its course.

Death following childbirth was a serious concern in the seventeenth to nineteenth centuries, causing epidemics ‘of unimaginable proportions’ [24]. A major cause of this mortality, puerperal fever (fever following childbirth), was investigated by Ignaz Semmelweis, a Hungarian doctor. In 1844 he compared the death rates among patients in two wards of a hospital in Vienna. He found that the death rates in a ward staffed by doctors was much higher (16%) than in the one run by midwives (2%) [25]. This, and other observations, led Semmelweis to conclude that the illness was transmitted by doctors coming directly from a post‐mortem to help deliver a baby. He initiated a preventive measure, compulsory hand washing in a chloride of lime solution, which reduced the mortality in the doctors’ ward to 3% [25]. His approach was not popular, because it implied that doctors transmitted disease, and Semmelweis's contract was not renewed. He was finally vindicated some 30 years later when Pasteur identified the bacterium, Streptococcus pyogenes, that caused puerperal fever [25].

These treatment evaluations utilised two different types of comparisons: contemporary controls and historical controls. Contemporary controls are patients who were seen at the same time as those getting the new treatment, but who received the conventional care. Historical controls are patients who had been treated previously in the same location (e.g. hospital). Jurin's comparisons of groups at risk of smallpox, and Semmelweis's comparison of puerperal fever in two wards, used contemporary controls. In contrast the comparison of puerperal fever before and after introducing handwashing, and Paré's comparison of treatments for gunshot wounds, used historical control groups.

The problem with both types of control groups is that there could be systematic differences between the patients in the different groups. Isaac Massey, a contemporary of Jurin, made this criticism of the work on smallpox, pointing out that those who could afford variolation may have been in better health than those in the comparison groups [22]. He concluded that what was needed was groups that were similar, they ‘must and ought to be as near as may be on a Par’ [22].

COMPARING SIMILAR GROUPS

When groups are similar at baseline, it is more likely that any differences in subsequent outcomes will be due to the differences in the effects of the treatments. The idea of comparing like with like was proposed in the fourteenth century by the poet Francisco Petrarch, who suggested using similar groups of patients to compare the then current treatments with simply letting nature take its course [26].

One way to create similar groups is to recruit a number of patients who are all alike, then give them different treatments. The testing of potential treatments for scurvy is a widely cited example of the benefit of using similar groups. Scurvy is a debilitating and sometimes fatal disease, which afflicted sailors on long‐distance sea voyages from the fifteenth to the nineteenth centuries [27, 28]. By the late 1500s, the benefits of consuming oranges and lemons were well known by Dutch sailors [27], but many English expeditions continued to suffer serious loss of life through scurvy [28]. The issue was still unresolved in 1747 when James Lind, a Royal Navy surgeon, carried out a classic study to assess the effects of six common treatments. He identified 12 sailors with scurvy who were ‘as similar as I could have them’, and tested each of the treatments on groups of 2 men (each pair to receive either: oil of vitriol, vinegar, sea water, cider, oranges and lemons, or a herbal paste) [29]. After 14 days Lind observed ‘the most sudden and visible good effects were perceived from the use of oranges and lemons’. These findings were not widely accepted, and even Lind had doubts about them [29, 30], but the method used reflects an advance in thinking about ways to test treatments. Lind is rightly celebrated for his comparison of like with like in the evaluation of treatments. (In his ‘Treatise of the Scurvy’ Lind does not make any clear recommendations for the treatment of the disease, possibly because he believed that scurvy was not due to poor diet, but was a result of faulty digestion exacerbated by wet weather [29, 30].)

Another study in the eighteenth century used similar groups to assess whether the adverse effects of variolation (to prevent smallpox) could be ameliorated by pretreatment with a compound of mercury. At that time about 1 out of 50 patients vaccinated against smallpox died following the procedure [31]. In 1767 William Watson recruited 31 children who were similar in age, gender and diet [32]. These were divided into three groups, which received either the mercury mixture, a mild senna laxative or no treatment. No clear difference was found between the groups, using an objective measure of assessment (the number of pock marks caused by the variolation). Watson concluded that variolation against smallpox was effective with or without pretreatment with mercury or a mild laxative [32].

CASTING LOTS AND TREATMENT ALLOCATION

Comparing similar groups of patients was an important step forward in the evaluation of treatments, but it leaves open the possibility that the groups may have differed on important factors that were not measured. Further, a subconscious bias in the doctor allocating patients to treatments could influence the way individuals were assigned to groups (e.g. the slightly sicker ones might be preferentially assigned to one group). An alternative approach, which prevents this bias, is to allocate individuals to treatments in a truly random way, so that the final groups will be balanced on all factors, whether measured or not.

The idea that some form of randomisation should be used to allocate patients to treatment groups was proposed in the 1640s. Joan Baptista van Helmont, a Flemish chemist, alchemist and physician, recommended this method to evaluate the effectiveness of bloodletting [33]. He suggested dividing up to 500 patients into 2 groups, then casting lots (equivalent to tossing a coin) to decide which group would be given the conventional therapy (bloodletting) and which would receive van Helmont's own treatment. A notable feature of the trial design is that the outcome would be decided by the number of funerals that occurred in the two groups. The experiment was not carried out. (The proposed use of an objective outcome measure such as this is unusual for its time.)

One method of randomised allocation was used in 1848 by Thomas Graham Balfour to investigate whether homeopathic belladonna could prevent scarlet fever. Balfour identified 151 boys who had not had the disease, and ‘divided them into two sections, taking them alternately from the list, to prevent the imputation of selection’ [34]. Balfour recognised that if he had to decide which boys were allocated to each group, his choices might be biased. (Alternate selection from a list is essentially a method of randomisation, as the factors which are related to dying from scarlet fever, will be randomly scattered throughout the list.) The study showed that exactly two children in each group developed scarlet fever, leading him to conclude that ‘the numbers are too small to justify deductions as to the prophylactic power of belladonna’ [34], a commendably careful interpretation of the findings.

Instead of alternate selection from a list, patients could be allocated to treatments by the date of their admission to hospital. This method was used by the Danish physician Johannes Fibiger in 1896–1897 [35] to evaluate the effectiveness of a serum treatment for diphtheria. Thus, patients admitted to hospital on one day received serum and those on the next day were untreated. The outcome was persuasive: only 8 of 239 patients in the serum group died, compared to 30 of the 245 controls.

The use of alternate allocation began to gain popularity in the first few decades of the twentieth century because it prevented bias in the assignment of patients to treatments. These research studies were conducted in both the United States and the UK, with patients being randomised by the order of their attendance at a healthcare facility [36–39]. These trials signalled the growing recognition of the importance of achieving comparable groups.

RANDOM NUMBERS FOR TREATMENT ALLOCATION

A landmark series of three trials, conducted under the auspices of the UK Medical Research Council, used random numbers to allocate patients to treatments. This methodological advance was proposed by the medical statistician Professor (later Sir) Austin Bradford Hill [40]. It was first used in a large field trial that assessed the effectiveness of a vaccine for whooping cough [41]. Although this study began in 1944, it was not published until 1951. The second trial, of streptomycin for pulmonary tuberculosis, became the most highly acclaimed study in the history of treatment evaluation. It began in 1946, but was the first to be published, in 1948 [42]. The third trial involved a large‐scale field trial of an antihistaminic drug (thonzylamine) for the prevention of the common cold [43].

As well as being published first, the streptomycin trial provided a major advance in the treatment of a feared disease, tuberculosis: it reduced the fatality rate at six months from 27% to 7% and also reduced the severity of disease among survivors. An editorial that accompanied the paper identified the advantage of individual randomisation over alternate allocation: it prevented a patient being included or rejected, based on whether the next treatment was to be antibiotic or control [44]. For example, if the doctor thought that the drug would not be effective in seriously ill patients, they might not be included in the study if they were scheduled to receive the active treatment. This would only need to happen a few times to bias the results of the study.

In addition to the use of random numbers to allocate patients to treatments, these three trials stand out for two other reasons. Patients were recruited from multiple centres to provide sufficient participants to be able to draw firm conclusions. The researchers also made considerable efforts to ensure that the participants, and the clinicians measuring the outcomes, were unaware of which treatment the patients received. This prevented bias in the reporting of symptoms by participants, and by those recording the outcomes: in modern terminology, it was double blind.

The landmark streptomycin trial in tuberculosis was followed by another study on pulmonary tuberculosis, published two years later [45]. It compared three treatments: streptomycin, another drug, para‐amino‐salicylic acid (PAS), and a combination of these two drugs. The same methodology was used as in the first streptomycin trial. The combination therapy had the best outcome, with streptomycin coming second. More importantly the combined treatment led to a much lower frequency of bacterial resistance to streptomycin. This study has been credited with leading to the maxim ‘never treat active tuberculosis with a single agent’, which is now the standard for managing this disease [46]. The clinical benefits apart, this set of four rigorous studies supported by the Medical Research Council inaugurated the era of high quality clinical trials.

THE NEED FOR BLINDING

A major concern of several trials in the middle of the twentieth century was to ensure that the patients and clinical observers were unaware of how treatments were allocated [39, 41, 43]. This would prevent knowledge of who received which treatment from influencing the outcome of the study. The idea was not new; it featured in studies to evaluate a treatment called animal magnetism. This treatment was championed in the late 1700s by Franz Anton Mesmer, who believed he could impart magnetic energy and thereby cure a wide range of illnesses [47]. Mesmer achieved great fame, and a lucrative medical practice in Paris. This popularity was of such concern to other doctors, and to the government, that they persuaded Louis XVI to establish a Royal Commission to evaluate the claims of cures and dramatic effects [47, 48]. The Commission conducted a series of studies in which participants either thought they were being magnetised (when they were not), or thought they were not subjected to magnetism (when they were). The findings were convincing. Participants only reported benefits when they (falsely) believed they were being treated: ‘the imagination is the real cause of the effects attributed to magnetism’ [47]. Following publication of the Commission's report, Mesmer was ridiculed, and animal magnetism was abandoned in France.

A more recent example of the importance of blinding is the evaluation of a surgical technique, internal‐mammary artery‐ligation, for the relief of angina symptoms. Several reports in the 1950s had claimed that the operation provided considerable relief of symptoms [49, 50]. This prompted two groups of researchers to carry out controlled trials to evaluate the effectiveness of the surgery. Patients were randomly allocated to have artery ligation, or to a control group which received a sham operation involving only a skin incision. The patients, and the cardiologists who evaluated the outcomes, were blind to treatment group. The ligation operation provided no benefit, as most patients in both the treatment group and in the control group reported significant improvement in symptoms [49, 50]. The authors concluded that these claims were most likely a psychological response to undergoing surgery.

The response to a sham treatment is known as a placebo effect. Understanding of the psychological and physiological factors underlying the placebo response has advanced greatly in recent years [51, 52]. A consistent finding is that patients who have high expectations of their treatment usually experience improvements in symptoms. If patients were aware of their treatment allocation, only those in the active group would have the high expectations. Concealment of treatment allocation could prevent this bias from creating difference between the groups.

CONCLUSION

Obtaining evidence on treatment effectiveness is a challenging business. As Passamani remarked in 1991, ‘The history of medicine is richly endowed with therapies that were widely used and then shown to be ineffective or frankly toxic’ [53]. A similar view was expressed by the celebrated American physician, Oliver Wendell Holmes in 1860, ‘if the whole materia medica, as now used, could be sunk to the bottom of the sea, it would be all the better for mankind – and all the worse for the fishes’ [54]. These may seem somewhat jaundiced views, but they reflect the large proportion of ineffective and possibly harmful treatments that were once used. Even in the early years of the twentieth century many ineffective treatments were widely used [55], and some treatments of little value continue to be used today [56]. Concern about this has led to a recent international campaign, ‘Choosing Wisely’, to reduce the use of ineffective or harmful treatments [57].

This chapter has presented examples of different approaches used to identify potentially effective treatments. Reliance on theories of disease processes is often unreliable and can result in harmful treatments being used. Careful observation of treatment outcomes in a series of patients can, if the benefits are immediate and substantial, identify effective treatments. Comparisons of groups of patients given different treatments are often more insightful, but are vulnerable to the criticism that the groups might not be similar at baseline. As the eminent French physician P C A Louis pointed out in 1834, ‘it is necessary to account for differences of age, sex, temperament, physical condition, natural history of the disease’ [58]. The use of groups constructed to be similar on some factors at baseline is a definite improvement, but leaves open the question that they differ on other (unmeasured) factors. Allocation of individual patients to treatments using random numbers overcomes two problems: clinician bias in assigning patients to groups, and differences in unmeasured factors.

The sequence of methods presented in this chapter could be taken to imply that there was a steady progression to increased robustness of study design. However, as the dates for the individual studies show, there is little evidence for continuous improvement in methods: rather there was substantial overlap in the use of these methods. The major advances in trial methodology occurred in studies conducted in the middle of the twentieth century. They used three techniques that are now hallmarks of high quality trials: randomisation, blinding of the investigators and patients to the randomisation process, and objective outcome measures.

In summary, this chapter has reviewed the development of methods to evaluate treatments up to the middle of the twentieth century. It has highlighted pitfalls of many of the earlier methods and concluded with an outline of the advantages of the double blind randomised controlled trial. This method is now used around the world to identify the benefits of treatments. Medicine now has the tools to ensure that only effective treatments are used. The next chapter explores whether the benefits of the randomised controlled trial have been realised.

REFERENCES

1. Porter, R. (1999).

The Greatest Benefit to Mankind

. London: Harpers Collins.

2. Cantu, J.Q. (1965). Benjamin Franklin's medical imprints.

Bull. Med. Libr. Assoc.

53: 71–79.

3. Bhatt, A. (2010). Evolution of clinical research: a history before and beyond Lames Lind.

Perspect. Clin. Res.

1: 6–10.

4. Ackerknecht, E.H. (1982).

A Short History of Medicine

. Baltimore: Johns Hopkins University Press.

5. DePalma, R.G., Hayes, V.W., and Zacharski, L.R. (2007). Bloodletting: past and present.

J. Am. Coll. Surg.

205: 132–144.

6. Risse, G.B. (1979). Renaissance of bloodletting – chapter in modern therapeutics.

J. Hist. Med. Allied Sci.

34: 3–22.

7. Stokes, W. (1865). The address in medicine.

BMJ

ii: 133–142.

8. Biggio, J.R. Jr. (2013). Bed rest in pregnancy: time to put the issue to rest.

Obstet. Gynecol.

121: 1158–1160.

9. Sprague, A.E. (2004). The evolution of bed rest as a clinical intervention.

J. Obstet. Gynecol. Neonatal. Nurs.

33: 542–549.

10. Bigelow, C. and Stone, J. (2011). Bed rest in pregnancy.

Mt Sinai J. Med.

78: 291–302.

11. McCall, C.A., Grimes, D.A., and Lyerly, A.D. (2013). ‘Therapeutic’ bed rest in pregnancy: unethical and unsupported by data.

Obstet. Gynecol.

121: 1305–1308.

12. Brower, R.G. (2009). Consequences of bed rest.

Crit. Care Med.

37: S422–S428.

13. Jack, D.B. (1997). One hundred years of aspirin.

Lancet

350: 437–439.

14. Montinari, M.R., Minelli, S., and De Caterina, R. (2019). The first 3500years of aspirin history from its roots – a concise summary.

Vasc. Pharmacol.

113: 1–8.

15. Vane, J.R., Flower, R.J., and Botting, R.M. (1990). History of aspirin and its mechanism of action.

Stroke

21: IV12–IV23.

16. Rosenfeld, L. (2002). Insulin: discovery and controversy.

Clin. Chem.

48: 2270–2288.

17. Banting, F.G. (1926). An address on diabetes and insulin: being the Nobel lecture delivered at Stockholm on September 15th, 1925.

CMAJ

16: 221–232.

18. Banting, F.G., Best, C.H., Collip, J.B. et al. (1922). Pancreatic extracts in the treatment of diabetes mellitus.

CMAJ

12: 141–146.

19. Karamitsos, D.T. (2011). The story of insulin discovery.

Diabetes Res. Clin. Pract.

93 (Suppl 1): S2–S8.

20. Wangensteen, O.H., Wangensteen, S.D., and Klinger, C.F. (1972). Wound management of Ambroise pare and Dominique Larrey, great French military surgeons of the 16th and 19th centuries.

Bull. Hist. Med.

46: 207–234.

21. Drucker, C.B. (2008). Ambroise pare and the birth of the gentle art of surgery.

Yale J. Biol. Med.

81: 199–202.

22. Huth, E. (2006). Quantitative evidence for judgments on the efficacy of inoculation for the prevention of smallpox: England and New England in the 1700s.

J. R. Soc. Med.

99: 262–266.

23. Bird, A. (2019). James Jurin and the avoidance of bias in collecting and assessing evidence on the effects of variolation.

J. R. Soc. Med.

112: 119–123.

24. Charles, D. and Larsen, B. (1986). Streptococcal puerperal sepsis and obstetric infections: a historical perspective.

Rev. Infect. Dis.

8: 411–422.

25. Adriaanse, A.H., Pel, M., and Bleker, O.P. (2000). Semmelweis: the combat against puerperal fever.

Eur. J. Obstet. Gynecol. Reprod. Biol.

90: 153–158.

26. Chalmers, I., Dukan, E., Podolsky, S. et al. (2012). The advent of fair treatment allocation schedules in clinical trials during the 19th and early 20th centuries.

J. R. Soc. Med.

105: 221–227.

27. Burnby, J. and Bierman, A. (1996). The incidence of scurvy at sea and its treatment.

Rev. Hist. Pharm.

44: 339–346.

28. Baron, J.H. (2009). Sailors' scurvy before and after James Lind – a reassessment.

Nutr. Rev.

67: 315–332.

29. Milne, I. (2012). Who was James Lind, and what exactly did he achieve.

J. R. Soc. Med.

105: 503–508.

30. Bartholomew, M. (2002). James Lind's treatise of the scurvy (1753).

Postgrad. Med. J.

78: 695–696.

31. Boylston, A.W. (2002). Clinical investigation of smallpox in 1767.

N. Engl. J. Med.

346: 1326–1328.

32. Boylston, A. (2014). William Watson's use of controlled clinical experiments in 1767.

J. R. Soc. Med.

107: 246–248.

33. Donaldson, I.M. (2016). Van Helmont's proposal for a randomised comparison of treating fevers with or without bloodletting and purging.

J. R. Coll. Physicians Edinb.

46: 206–213.

34. Chalmers, I. and Toth, B. (2009). Nineteenth‐century controlled trials to test whether belladonna prevents scarlet fever.

J. R. Soc. Med.

102: 549–550.

35. Hrobjartsson, A., Gotzsche, P.C., and Gluud, C. (1998). The controlled clinical trial turns 100 years: Fibiger's trial of serum treatment of diphtheria.

BMJ

317: 1243–1245.

36. Podolsky, S.H. (2009). Jesse Bullowa, specific treatment for pneumonia, and the development of the controlled clinical trial.

J. R. Soc. Med.

102: 203–207.

37. MRC (1934). The serum treatment of lobar pneumonia: a report of the therapeutic trials Committee of the Medical Research Council.

BMJ

1: 241–245.

38. Lorriman, G. and Martin, W.J. (1950). Trial of antistin in the common cold.

BMJ

2: 430–431.

39. MRC (2004). Clinical trial of patulin in the common cold. 1944.

Int. J. Epidemiol.

33: 243–246.

40. Hill, A.B. (1990). Suspended judgment. Memories of the British streptomycin trial in tuberculosis. The first randomized clinical trial.

Control. Clin. Trials

11: 77–79.

41. MRC (1951). Prevention of whooping‐cough by vaccination; a Medical Research Council investigation.

BMJ

1: 1463–1471.

42. MRC (1948). Streptomycin treatment of pulmonary tuberculosis.

BMJ

2: 769–782.

43. MRC (1950). Clinical trials of antihistaminic drugs in the prevention and treatment of the common cold; report by a special committee of the Medical Research Council.

BMJ

2: 425–429.

44. Editorial (1948). The controlled therapeutic trial.

BMJ

2: 791–792.

45. MRC (1950). Treatment of pulmonary tuberculosis with streptomycin and Para‐aminosalicylic acid; a Medical Research Council investigation.

BMJ

2: 1073–1085.

46. Murray, J.F., Schraufnagel, D.E., and Hopewell, P.C. (2015). Treatment of tuberculosis. A historical perspective.

Ann. Am. Thorac. Soc.

12: 1749–1759.

47. Lanska, D.J. and Lanska, J.T. (2007). Franz Anton Mesmer and the rise and fall of animal magnetism: dramatic cures, controversy, and ultimately a triumph for the scientific method. In:

Brain, Mind and Medicine: Essays in Eighteenth‐Century Neuroscience

(eds. H. Whitaker, C.U.M. Smith and S. Finger), 301–320. New York: Springer

https://doi.org/10.1007/978‐0‐387‐70967‐3

.

48. Donaldson, I.M. (2005). Mesmer's 1780 proposal for a controlled trial to test his method of treatment using “animal magnetism”.

J. R. Soc. Med.

98: 572–575.

49. Dimond, E.G., Kittle, C.F., and Crockett, J.E. (1960). Comparison of internal mammary artery ligation and sham operation for angina pectoris.

Am. J. Cardiol.

5: 483–486.

50. Cobb, L.A., Thomas, G.I., Dillard, D.H. et al. (1959). An evaluation of internal‐mammary‐artery ligation by a double‐blind technic.

N. Engl. J. Med.

260: 1115–1118.

51. Enck, P., Bingel, U., Schedlowski, M. et al. (2013). The placebo response in medicine: minimize, maximize or personalize?

Nat. Rev. Drug Discov.

12: 191–204.

52. Horing, B., Weimer, K., Muth, E.R. et al. (2014). Prediction of placebo responses: a systematic review of the literature.

Front. Psychol.

https://doi.org/10.3389/fpsyg.2014.01079

.

53. Passamani, E. (1991). Clinical trials – are they ethical?

N. Engl. J. Med.

324: 1589–1592.

54. Schimmel, E.M. (1963). The physician as pathogen.

J. Chronic Dis.

16: 1–4.