Sample Sizes for Clinical, Laboratory and Epidemiology Studies - David Machin - E-Book

Sample Sizes for Clinical, Laboratory and Epidemiology Studies E-Book

David Machin

0,0
84,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

An authoritative resource that offers the statistical tools and software needed to design and plan valid clinical studies

Now in its fourth and extended edition, Sample Sizes for Clinical, Laboratory and Epidemiology Studiesincludes the sample size software (SSS) and formulae and numerical tables needed to design valid clinical studies. The text covers clinical as well as laboratory and epidemiology studies and contains the information needed to ensure a study will form a valid contribution to medical research. 

The authors, noted experts in the field, explain step by step and explore the wide range of considerations necessary to assist investigational teams when deriving an appropriate sample size for their when planned study. The book contains sets of sample size tables with companion explanations and clear worked out examples based on real data. In addition, the text offers bibliography and references sections that are designed to be helpful with guidance on the principles discussed.

This revised fourth edition:

  • Offers the only text available to include sample size software for use in designing and planning clinical studies
  • Presents new and extended chapters with many additional and refreshed examples
  • Includes clear explanations of the principles and methodologies involved with relevant practical examples
  • Makes clear a complex but vital topic that is designed to ensure valid methodology and publishable results 
  • Contains guidance from an internationally recognised team of medical statistics experts

Written for medical researchers from all specialities and medical statisticians, Sample Sizes for Clinical, Laboratory and EpidemiologyStudies offers an updated fourth edition of the important guide for designing and planning reliable and evidence based clinical studies.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 834

Veröffentlichungsjahr: 2018

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Sample Sizes for Clinical, Laboratory and Epidemiology Studies

Fourth Edition

David Machin

Leicester Cancer Research CentreUniversity of Leicester, Leicester, UKMedical Statistics Group, School of Health and Related Research,University of Sheffield, Sheffield, UK

Michael J. Campbell

Medical Statistics Group, School of Health and Related Research,University of Sheffield, Sheffield, UK

Say Beng Tan

SingHealth Duke‐NUS Academic Medical Centre, Singapore

Sze Huey Tan

Division of Clinical Trials and Epidemiological SciencesNational Cancer Centre, Singapore

This fourth edition first published 2018© 2018 by John Wiley & Sons Ltd

Edition History: John Wiley & Sons Ltd (1e, 1987; 2e, 1997; 3e, 2008).

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.

The right of David Machin, Michael J. Campbell, Say Beng Tan and Sze Huey Tan to be identified as the authors in this work has been asserted in accordance with law.

Registered Office(s)John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USAJohn Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK

Editorial Office9600 Garsington Road, Oxford, OX4 2DQ, UK

For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.

Wiley also publishes its books in a variety of electronic formats and by print‐on‐demand. Some content that appears in standard print versions of this book may not be available in other formats.

Limit of Liability/Disclaimer of WarrantyThe contents of this work are intended to further general scientific research, understanding, and discussion only and are not intended and should not be relied upon as recommending or promoting scientific method, diagnosis, or treatment by physicians for any particular patient. In view of ongoing research, equipment modifications, changes in governmental regulations, and the constant flow of information relating to the use of medicines, equipment, and devices, the reader is urged to review and evaluate the information provided in the package insert or instructions for each medicine, equipment, or device for, among other things, any changes in the instructions or indication of usage and for added warnings and precautions. While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

Library of Congress Cataloging‐in‐Publication Data

Names: Machin, David, 1939– author. | Campbell, Michael J., 1950– author. | Tan, Say Beng, author. | Tan, Sze Huey, author.Title: Sample Sizes for Clinical, Laboratory and Epidemiology Studies / David Machin, Michael J. Campbell, Say Beng Tan, Sze Huey Tan.Description: Fourth edition. | Hoboken, NJ : Wiley, [2018]. | Preceded by Sample size tables for clinical studies / David Machin ... [et al.]. 3rd ed. 2009. | Includes bibliographical references and index. |Identifiers: LCCN 2018001008 (print) | LCCN 2018002299 (ebook) | ISBN 9781118874929 (pdf) | ISBN 9781118874936 (epub) | ISBN 9781118874943 (hardback)Subjects: | MESH: Research Design | Clinical Studies as Topic | Statistics as Topic | Sample Size | TablesClassification: LCC R853.C55 (ebook) | LCC R853.C55 (print) | NLM W 20.55.C5 | DDC 615.5072/4–dc23LC record available at https://lccn.loc.gov/2018001008

Cover design: WileyCover images: © Ralf Hiemisch / Getty Images (Background Abstract Image) © SergeyIT / Getty Images (Crowd Image on Top), © SUWIT NGAOKAEW / Shutterstock (Scientist Image on bottom)

Preface

It has been more than thirty years since the original edition of ‘Statistical Tables for the Design of Clinical trials’ was published. During this time, there have been considerable advances in the field of medical research, including the completion of the Human Genome Project, the growth of personalised (or precision) medicine using targeted therapies, and increasingly complex clinical trial designs.

However, the principles of good research planning and practice remain as relevant today as they did thirty years ago. Indeed, all these advances in research would not have been possible without investigators holding firm to these principles, including the need for a rigorous study design and the appropriate choice of sample size for the study.

This fourth edition of the book features a third change in title. The original title had suggested (although not intentionally) a focus on ‘clinical trials’, the second saw an extension to ‘clinical studies’ and now ‘clinical, laboratory and epidemiology studies’. Currently, sample size considerations are deeply imbedded in planning clinical trials and epidemiological studies but less so in other aspects of medical research. The change to the title is intended to draw more attention to areas where sample size issues are often overlooked.

This text cannot claim to be totally comprehensive and so choices had to be made as to what to include. In general terms, there has been a major reorganisation and extension of many of the chapters of the third edition, as well as new chapters, and many illustrative examples refreshed and others added. In particular, basic design considerations have been extended to two chapters; repeated measures, more than two groups and cluster designs each have their own chapter with the latter extended to include stepped wedge designs. Also there is a chapter concerning genomic targets and one concerned with pilot and feasibility studies.

In parallel to the increase in the extent of medical research, there has also been a rapid and extensive improvement in capability and access to information technology. Thus while the first edition of this book simply provided extensive tabulations on paper, the second edition provided some basic software on a floppy disc to allow readers to extend the applicability to situations outside the scope of the printed tables. This ability was further enhanced in the third edition with more user‐friendly and powerful software on a CD‐ROM provided with the book. The book is supported by user‐friendly software through the associated Wiley website. In addition, R statistical software code is provided.

Despite these improved software developments, we have still included some printed tables within the text itself as we wish to emphasise that determining the appropriate sample size for a study is not simply a task of plugging some numerical values into a formula with the parameters concerned, but an extensive investigation of what is suitable for the study intended. This would include face‐to‐face discussions between the investigators and statistical team members, for which having printed tables available can be helpful. The tabulations give a very quick ‘feel’ as to how sensitive sample sizes can often be to even small perturbations in the assumed planning values of some of the parameters concerned. This brings an immediate sense of realism to the processes involved.

For the general reader Chapters 1 and 2 give an overview of design considerations appropriate to sample size calculations. Thereafter the subsequent chapters are designed to be as self‐contained as possible. However, some later chapters, such as those describing cluster and stepped wedge designs, will require sample size formulae from the earlier chapters to complete the sample size calculations.

We continue to be grateful to many colleagues and collaborators who have contributed directly or indirectly to this book over the years. We specifically thank Tai Bee Choo for help with the section on competing risks, Gao Fei on cluster trials and Karla Hemming and Gianluca Baio on aspects of stepped wedge designs.

David Machin, Michael J. Campbell, Say Beng Tan and Sze Huey TanJuly 2017

Dedication

The authors would like to dedicate this book to Oliver, Joshua, Sophie and Caitlin; Matthew, Annabel, Robyn, Flora and Chloe; Lisa, Sophie, Samantha and Emma; Kim San, Geok Yan and Janet.

1Basic Design Considerations

SUMMARY

This chapter reviews the reasons why sample size considerations are important when planning a clinical study of any type. The basic elements underlying this process include the null and alternative study hypotheses, effect size, statistical significance level and power, each of which are described. We introduce the notation to distinguish the population parameters we are trying to estimate with the study, from their anticipated value at the planning stages and also from their estimated value once the study has been completed. We emphasise for comparative studies that, whenever feasible, it is important to randomise the allocation of subjects to respective groups.

The basic properties of the standardised Normal distribution are described. Also discussed is how, once the effect size, statistical significance level and power for a comparative study using a continuous outcome are specified, the Fundamental Equation (which essentially plays a role in most sample size calculations for comparative studies) is derived.

The Student’s t‐distribution and the Non‐central t‐distribution are also described. In addition the Binomial, Poisson, Negative‐Binomial, Beta and Exponential statistical distributions are defined. In particular, the circumstances (essentially large study sizes) in which the Binomial and Poisson distributions have an approximately Normal shape are described. Methods for calculating confidence intervals for a population mean are indicated together with (suitably modified) how they can be used for a proportion or a rate in larger studies. For the Binomial situation, formulae are also provided where the sample size is not large. Finally, a note concerning numerical accuracy of the calculations in the illustrative examples of later chapters is included.

1.1 Why Sample Size Calculations?

To motivate the statistical issues relevant to sample size calculations, we will assume that we are planning a two‐group clinical trial in which subjects are allocated at random to one of two alternative treatments for a particular medical condition and that a single endpoint measure has been specified in advance. However, it should be emphasised that the basic principles described, the formulae, sample size tables and associated software included in this book are equally relevant to a wide range of design types covering all areas of medical research ranging from the epidemiological to clinical and laboratory‐based studies.

Whatever the field of inquiry the investigators associated with a well‐designed study will have considered the research questions posed carefully, formally estimated the required sample size (the particular focus for us in this book), and recorded the supporting reasons for their choice. Awareness of the importance of these has led to the major medical and related journals demanding that a detailed justification of the study size be included in any submitted article as it is a key component for peer reviewers to consider when assessing the scientific credibility of the work undertaken. For example, the General Statistical Checklist of the British Medical Journal asks statistical reviewers of their submitted papers ‘Was a pre‐study calculation of study size reported?’ Similarly, many research grant funding agencies such as the Singapore National Medical Research Council now also have such requirements in place.

In any event, at a more mundane level, investigators, grant‐awarding bodies and medical product development companies will all wish to know how much a study is likely to ‘cost’ both in terms of time and resources consumed as well as monetary terms. The projected study size will be a key component in this ‘cost’. They would also like to be reassured that the allocated resource will be well spent by assessing the likelihood that the study will give unequivocal results. In particular for clinical trials, the regulatory authorities, including the Committee for Proprietary Medicinal Products (CPMP, 1995) in the European Union and the Food and Drug Administration (FDA, 1988 and 1996) in the USA, require information on planned study size. These are encapsulated in the guidelines of the International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use (1998) ICH Topic E9.

If too few subjects are involved, the study is potentially a misuse of time because realistic differences of scientific or clinical importance are unlikely to be distinguished from chance variation. Too large a study can be a waste of important resources. Further, it may be argued that ethical considerations also enter into sample size calculations. Thus a small clinical trial with no chance of detecting a clinically useful difference between treatments is unfair to all the patients put to the (possible) risk and discomfort of the trial processes. A trial that is too large may be unfair if one treatment could have been ‘proven’ to be more effective with fewer patients as a larger than necessary number of them has received the (now known) inferior treatment.

Providing a sample size for a study is not simply a matter of providing a single number from a set of statistical tables. It is, and should be, a several‐stage process. At the preliminary stages, what is required are ‘ball‐park’ figures that enable the investigators to judge whether or not to start the detailed planning of the study. If a decision is made to proceed, then the later stages are used to refine the supporting evidence for the preliminary calculations until they make a persuasive case for the final patient numbers chosen. Once decided this is then included (and justified) in the final study protocol.

After the final sample size is determined and the protocol is prepared and approved by the relevant bodies, it is incumbent on the research team to expedite the recruitment processes as much as possible, ensure the study is conducted to the highest of standards possible, and ensure that it is eventually reported comprehensively.

1.2 Statistical Significance

Notation

In very brief terms the (statistical) objective of any study is to estimate from a sample the value of a population parameter. For example, if we were interested in the mean birth weight of babies born in a certain locality, then we may record the weight of a selected sample of N babies and their mean weight is taken as our estimate of the population mean birth weight denoted ωPop. The Greek ω distinguishes the population value from its estimate, the Roman . When planning a study, we are clearly ignorant of ωPop and neither do we have the data to calculate . As we shall see later, when planning a study the investigators will usually need to provide some value for what ωPop may turn out to be. This anticipated value is denoted ωPlan. This value then forms (part of) the basis for subsequent sample size calculations.

Outcomes

In any study, it is necessary to define an outcome (endpoint) which may be, for example, the birth weight of the babies concerned, as determined by the objectives of the investigation. In other situations this outcome may be a measure of blood pressure, wound healing time, degree of palliation, a patient reported outcome (PRO) that indicates the level of some aspect of their Quality of Life (QoL) or any other relevant and measureable outcome of interest.

The Effect Size

Consider, as an example, a proposed randomised trial of a placebo (control, C) against acupuncture (A) for the relief of pain in patients with a particular diagnosis. The patients are randomised to receive either A or C (how placebo acupuncture can be administered is clearly an important consideration). In addition, we assume that pain relief is assessed at a fixed time after randomisation and is defined in such a way as to be unambiguously evaluable for each patient as either ‘success’ or ‘failure’. We assume the aim of the trial is to estimate the true difference δPop between the true success rate πPopA of A and the true success rate πPopC of C. Thus the key (population) parameter of interest is δPop which is a composite of the two (population) parameters πPopA and πPopC.

At the completion of the trial the A patients yield a treatment success rate pA which is an estimate of πPopA and for C the corresponding items are pC and πPopC. Thus, the observed difference, d = pA − pC, provides an estimate of the true difference (the effect size) δPop = πPopA − πPopC.

Significance Tests

In a clinical trial, two or more forms of therapy or intervention may be compared. However, patients themselves vary both in their baseline characteristics at diagnosis and in their response to subsequent therapy. Hence in a clinical trial, an apparent difference in treatments may be observed due to chance alone, that is, we may observe a difference but it may be explained by the intrinsic characteristics of the patients themselves rather than ‘caused’ by the different treatments given. As a consequence, it is customary to use a ‘significance test’ to assess the weight of evidence and to estimate the probability that the observed data could in fact have arisen purely by chance.

The Null Hypothesis and Test Size

In our example, the null hypothesis, termed HNull, implies that A and C are equally effective or that δPop = πPopA − πPopC = 0. Even when that null hypothesis is true, at the end of the study an observed difference, d = pA − pC other than zero, may occur. The probability of obtaining the observed difference d or a more extreme one, on the assumption that δPop = 0, can be calculated using a statistical test. If, under this null hypothesis, the resulting probability or p‐value is very small, then we reject this null hypothesis of no difference and conclude that the two treatments do indeed differ in efficacy.

The critical value taken for the p‐value is arbitrary and is denoted by α. If, once calculated following the statistical test, the p‐value ≤ α then the null hypothesis is rejected. Conversely, if the p‐value > α, one does not reject the null hypothesis. Even when the null hypothesis is in fact true there is a risk of rejecting it. To reject the null hypothesis when it is true is to make a Type I error and the associated probability of this is α. The quantity α can be referred to either as the test size, significance level, probability of a Type I error or, sometimes, the false‐positive error.

The Alternative Hypothesis and Power

Usually in statistical significance testing, by rejecting the null hypothesis, we do not specifically accept any alternative hypothesis, and it is usual to report the range of plausible population values with a confidence interval (CI) as we describe in Section 1.6. However, sample size calculations are usually posed in a hypothesis test framework, and this requires us to specify an alternative hypothesis, termed HAlt, that the true effect size is δPop = πPopA − πPopC ≠ 0.

The clinical trial could yield an observed difference d that would lead to a p‐value > α even though the null hypothesis is really not true, that is, πPopA truly differs from πPopC and so δPop ≠ 0. In such a situation, we then fail to reject the null hypothesis although it is indeed false. This is called a Type II or false‐negative error and the probability of this is denoted by β.

As the probability of a Type II error is based on the assumption that the null hypothesis is not true, that is, δPop ≠ 0, then there are many possible values for δPop in this instance. Since there are countless potential values then each would give a different value for β.

The power is defined as one minus the probability of a Type II error, 1 − β. Thus ‘power’ is the probability of what ‘you want’, which is obtaining a ‘significant’ p‐value when the null hypothesis is truly false and so a difference between two interventions may be claimed.

1.3 Planning Issues

The Effect Size

Of the parameters that have to be pre‐specified before the sample size can be determined, the true effect size is the most critical. Thus, in order to estimate sample size, one must first identify the magnitude of the difference between the interventions A and C that one wishes to detect (strictly the minimum size of scientific or clinical interest) and quantify this as the (anticipated) effect size denoted δPlan. Although what follows is couched in terms of planning a randomised control trial, analogous considerations apply to all comparative study types.

Sometimes there is prior knowledge that enables an investigator to anticipate what size of benefit the test intervention is likely to bring, and the role of the trial is to confirm that expectation. In other circumstances, it may be possible to say that, for example, only the prospect of doubling of their median survival would be worthwhile for patients with a fatal disease who are rapidly deteriorating. This is because the test treatment is known to be toxic and likely to be a severe burden for the patient as compared to the standard approach.

One additional problem is that investigators are often optimistic about the effect of test interventions; it can take considerable effort to initiate a trial and so, in many cases, the trial would only be launched if the investigating team is enthusiastic about the new treatment A and is sufficiently convinced about its potential efficacy over C. Experience suggests that as trials progress there is often a growing realism that, even at best, the initial expectations were optimistic. There is also ample historical evidence to suggest that trials which set out to detect large effects nearly always result in ‘no significant difference was detected’. In such cases there may have been a true and clinically worthwhile, but smaller, benefit that has been missed, since the level of detectable difference set by the design was unrealistically high and hence the sample size too small to detect this important difference.

It is usual for most clinical trials that there is considerable uncertainty about the relative merits of the alternative interventions so that even when the new treatment or intervention under test is thought for scientific reasons to be an improvement over the current standard, the possibility that this is not the case is allowed for. For example, in the clinical trial conducted by Chow, Tai, Tan, et al (2002) it was thought, at the planning stage, that high dose tamoxifen would not compromise survival in patients with inoperable hepatocellular carcinoma. This turned out not to be the case and, if anything, tamoxifen was detrimental to their ultimate survival time. This is not an isolated example.

In practice, when determining an appropriate effect size, a form of iteration is often used. The clinical team might offer a variety of opinions as to what clinically useful difference will transpire — ranging perhaps from an unduly pessimistic small effect to the optimistic (and unlikely in many situations) large effect. Sample sizes may then be calculated under this range of scenarios with corresponding patient numbers ranging perhaps from extremely large to relatively small. The importance of the clinical question and/or the impossibility of recruiting large patient numbers may rule out a very large trial but conducting a small trial may leave important clinical effects not firmly established. As a consequence, the team may next define a revised aim maybe using a summary derived from their individual opinions, and the calculations are repeated. Perhaps the sample size now becomes attainable and forms the basis for the definitive protocol.

There are a number of ways of eliciting useful effect sizes using clinical opinion: a Bayesian perspective has been advocated by Spiegelhalter, Freedman and Parmar (1994), an economic approach by Drummond and O’Brien (1993) and one based on patients’ perceptions rather than clinicians’ perceptions of benefit by Naylor and Llewellyn‐Thomas (1994). Gandhi, Tan, Chung and Machin (2015) give a specific case study describing the synthesis of prior clinical beliefs, with information from non‐randomised and randomised trials concerning the treatment of patients following curative resection for hepatocellular carcinoma. Cook, Hislop, Altman et al (2015) also give useful guidelines for selection of an appropriate effect size.

One‐ or Two‐Sided Significance Tests

It is plausible to assume in the acupuncture trial referred to earlier that the placebo is in some sense ‘inactive’ and that any ‘active’ treatment will have to perform better than the ‘inactive’ treatment if it is to be adopted into clinical practice. Thus rather than set the alternative hypothesis as HAlt: πPopA ≠ πPopC, it may be replaced by HAlt: πPopA > πPopC. This formulation leads to a 1‐sided statistical significance test.

On the other hand, if we cannot make this type of assumption about the new treatment at the design stage, then the alternative hypothesis is HAlt: πPopA ≠ πPopC. This leads to a 2‐sided statistical significance test.

For a given sample size, a 1‐sided test is more powerful than the corresponding 2‐sided test. However, a decision to use a 1‐sided test should never be made after looking at the data and observing the direction of the departure. Such decisions should be made at the design stage, and a 1‐sided test should only be used if it is certain that departures in the particular direction not anticipated will always be ascribed to chance and therefore regarded as non‐significant, however large they turn out to be.

It is more usual to carry out 2‐sided tests of significance but, if a 1‐sided test is to be used, this should be indicated and justified clearly for the problem in hand. Chapter 6, which refers to post‐marketing studies, and Chapter 11, which discusses non‐inferiority trials, give some examples of studies where the use of a 1‐sided test size can be clearly justified.

Choosing α and β

It is customary to start by specifying the effect size required to be detected and then to estimate the number of patients necessary to enable the trial to detect this difference if it truly exists. Thus, for example, it might be anticipated that acupuncture could improve the response rate from 20% with C to 30% with A and, since this is deemed a plausible and medically important improvement, it is desired to be reasonably certain of detecting such a difference if it really exists. ‘Detecting a difference’ is usually taken to mean ‘obtaining a statistically significant difference with the p‐value < 0.05’; and similarly the phrase ‘to be reasonably certain’ is usually interpreted to mean something like ‘to have a chance of at least 90% of obtaining such a p‐value’ if there really is an improvement from 20 to 30%. This latter statement corresponds, in statistical terms, to saying that the power of the trial should be 0.9 or 90%.

The choice for α is essentially an arbitrary one, the choice being made by the study investigating team. However, practice, accumulated over a long period of time, has established α = 0.05 as something of a convention. Thus in the majority of cases, investigators, editors of journals and their readers have become accustomed to anticipate this value. If a different value is chosen then investigators would be advised to explain why.

Convention is not so well established with respect to the size of β, although in the context of a randomised control trial, to set β > 0.2, implying a power of less than 80%, would be regarded with some scepticism. Indeed, the use of 90% has become more of the norm (however, see Chapter 16, concerned with feasibility studies where the same considerations will not apply). In some circumstances, it may be the type of study to be conducted that determines this choice. Nevertheless, it is the investigating team which has to consider the possibilities and make the final choice.

Sample Size and Interpretation of Significance

The results of the significance test, calculated on the assumption that the null hypothesis is true, will be expressed as a ‘p‐value’. For example, at the end of the trial if the difference between treatments is tested, then a p‐value < 0.05 would indicate that so extreme or greater an observed difference could be expected to have arisen by chance alone less than 5% of the time, and so it is quite likely that a treatment difference really is present.

However, if only a few patients were entered into the trial then, even if there really was a true treatment difference, the results are likely to be less convincing than if a much larger number of patients had been assessed. Thus, the weight of evidence in favour of concluding that there is a treatment effect will be much less in a small trial than in a large one. In statistical terms, we would say that the ‘sample size’ is too small and that the ‘power of the test’ is very low.

Suppose the results of an observed treatment difference in a clinical trial are declared ‘not statistically significant’. Such a statement only indicates that there was insufficient weight of evidence to be able to declare that ‘the observed difference is unlikely to have arisen by chance’. It does not imply that there is ‘no clinically important difference between the treatments’ as, for example, if the sample size was too small the trial might be very unlikely to obtain a significant p‐value even when a clinically relevant difference is truly present. Hence, it is of crucial importance to consider sample size and power when interpreting statements about ‘non‐significant’ results. In particular, if the power of the statistical test was very low, all one can conclude from a non‐significant result is that the question of treatment differences remains unresolved.

1.4 The Normal Distribution

The Normal distribution plays a central role in statistical theory and frequency distributions resembling the Normal distribution form are often observed in practice. Of particular importance is the standardised Normal distribution, which is the Normal distribution that has a mean equal to 0 and a standard deviation (SD) equal to 1. The probability density function of such a Normally distributed random variable z is given by

(1.1)

where π represents the irrational number 3.14159…. The curve described by equation (1.1) is shown in Figure 1.1

Figure 1.1 The probability density function of a standardised Normal distribution.

For sample size purposes, we shall need to calculate the area under some part of this Normal curve. To do this, use is made of the symmetrical nature of the distribution about the mean of 0 and the fact that the total area under a probability density function is unity.

Any shaded area similar to that in Figure 1.1 which has area γ (here γ ≥ 0.5) has a corresponding value of zγ along the horizontal axis that can be calculated. This may be described in mathematical terms by the following integral:

(1.2)