Medical Statistics at a Glance Workbook - Aviva Petrie - E-Book

Medical Statistics at a Glance Workbook E-Book

Aviva Petrie

0,0
23,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

This comprehensive workbook contains a variety of self-assessment methods that allow readers to test their statistical knowledge, put it into practice, and apply it in a medical context, while also providing guidance when critically appraising published literature. It is designed to support the best-selling third edition of Medical Statistics at a Glance, to which it is fully cross-referenced, but may be used independently of it.

Ideal for medical students, junior doctors, researchers and anyone working in the biomedical and pharmaceutical disciplines who wants to feel more confident in basic medical statistics, the title includes:

  • Over 80 MCQs, each testing knowledge of a single statistical concept or aspect of study interpretation
  • 29 structured questions  to explore in greater depth several statistical techniques or principles, including the choice of appropriate statistical analyses and the interpretation of study findings
  • Templates for the appraisal of clinical trials and observational studies, plus full appraisals of two published papers to demonstrate the use of these templates in practice
  • Detailed step-by-step analyses of two substantial data sets (also available at www.medstatsaag.com) to demonstrate the application of statistical procedures to real-life research

Medical Statistics at a Glance Workbook is the ideal resource to test statistical knowledge and improve analytical and interpretational skills.

Additional resources are available at www.medstatsaag.com, including:

  • Excel datasets to accompany the data analysis section
  • Downloadable PDFs of two templates for critical appraisal
  • Links to online further reading
  • Supplementary MCQs

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 410

Veröffentlichungsjahr: 2012

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

Companion website

Title page

Copyright page

Introduction

Part 1: Multiple-choice questions

Part 2: Structured Questions

Part 3: Critical appraisal

Randomised controlled trial: template

Randomised controlled trial: Paper 1

Observational study: template

Observational study: Paper 2

Part 4: Data analysis

Dataset 1 analysed by Stata v11 (StataCorp LP, Texas, USA)

D1.1 Introduction

D1.2 Aims

D1.3 Repeatability

D1.4 Getting a feel for the data

D1.5 Baseline data: comparability of bracket groups

D1.6 Maximum pain intensity analysis

D1.7 AUC of pain intensity analysis

D1.8 Random effects analysis of longitudinal data

D1.9 Summary

D1.10 Note

Dataset 2 analysed using IBM SPSS Statistics v20

D2.1 Introduction

D2.2 Aims

D2.3 Relationship between UPSIT-40 and SS-16

D2.4 Univariable analyses

D2.5 Logistic regression analyses

D2.6 Using the receiver operator characteristic (ROC) curves

D2.7 Conclusion

Part 5: Solutions

Solutions to multiple-choice questions

Model answers for structured questions

Randomised controlled trial: critical appraisal of Paper 1

Observational study: critical appraisal of Paper 2

Appendices

Appendix I: list of multiple-choice questions with relevant chapter numbers from Medical Statistics at a Glance (3rd edn) and associated topics

Handling data

Sampling and estimation

Study design

Hypothesis testing

Basic techniques for analysing data

Additional techniques

Appendix II: list of structured questions with relevant chapter numbers from Medical Statistics at a Glance (3rd edn) and associated topics

Appendix III: chapter numbers from Medical Statistics at a Glance (3rd edn) with relevant multiple-choice questions and structured questions

This title is also available as an e-book.

For more details, please see

www.wiley.com/buy/9780470658482

or scan this QR code:

Companion website

Additional resources are available at:

www.medstatsaag.com

featuring:

Excel datasets to accompany the data analysis sections

Downloadable PDFs of two analysis templates

Links to online further reading

Supplementary MCQs

This edition first published 2013 © 2013 by Aviva Petrie and Caroline Sabin

Wiley-Blackwell is an imprint of John Wiley & Sons, formed by the merger of Wiley’s global Scientific, Technical and Medical business with Blackwell Publishing.

Registered office: John Wiley & Sons, Ltd, The Atrium, Southern Gate, Chichester, West Sussex,

PO19 8SQ, UK

Editorial offices: 9600 Garsington Road, Oxford, OX4 2DQ, UK

The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK

111 River Street, Hoboken, NJ 07030-5774, USA

For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com/wiley-blackwell.

The right of the author to be identified as the author of this work has been asserted in accordance with the UK Copyright, Designs and Patents Act 1988.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.

Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought.

Library of Congress Cataloging-in-Publication Data

Petrie, Aviva.

 Medical statistics at a glance workbook / Aviva Petrie, Caroline Sabin.

p. cm.

Includes bibliographical references and index.

 ISBN 978-0-470-65848-2 (pbk. : alk. paper) 1. Medical statistics. I. Sabin, Caroline. II. Title.

 R853.S7P4762 2013

 610.72'7–dc23

2012025027

A catalogue record for this book is available from the British Library.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.

Cover design by Nathan Harris

Introduction

This workbook is a companion volume to the third edition of Medical Statistics at a Glance. Although primarily directed at undergraduate medical students preparing for statistics examinations, we believe that the workbook will also be of use to others working in the biomedical disciplines who simply want to brush up on their analytical and interpretation skills (e.g. other medical researchers, postgraduates in the biomedical disciplines and pharmaceutical industry personnel). Our aim for this workbook is therefore for it to act as a revision aid, equip readers with the skills necessary to read and interpret the published literature and give them the confidence to tackle their own statistical analyses. Although designed as an accompanying text to Medical Statistics at a Glance, it is not indelibly linked to it and can be used as a stand-alone text or in conjunction with any reputable text on statistics.

We believe that the optimal way to learn statistics is to put the theory into practice by undertaking an analysis of a data set, but recognise that this may not always be practical. Instead, the use of carefully constructed exercises in a variety of formats can help to test and fully evaluate the reader’s understanding of the material (and identify any gaps that remain). As the At a Glance textbook presents information in a concise manner, there is limited space in it for worked examples and no room for exercises. Our workbook amends this insufficiency by providing an extensive set of questions, as well as templates for critical appraisal and descriptions of the statistical analyses of two data sets. Where possible, we have based questions on published studies in the medical and dental fields, and references are provided so that the reader may consult the original source material if interested.

The Structure of the Workbook

This workbook is divided into six parts:

Part

1

This section of the workbook contains multiple-choice questions (MCQs) that are generally brief, each testing the reader’s knowledge of a single theoretical concept or aspect of study interpretation. Only one of the five possible answers provided is correct: an explanation is given in Part 5 for each correct and incorrect answer. The ordering of the MCQs generally follows that of the chapters in the third edition of

Medical Statistics at a Glance

. To aid readers who may wish to focus on specific topics in the

At a Glance

textbook, we provide a list of MCQs and the related chapters in Appendix I.

Part

2

This section of the workbook contains structured questions that are longer than the MCQs and provide a more in-depth exploration of the reader’s knowledge of several statistical concepts. The questions may include elements that test a reader’s understanding of the theory, as well as his or her ability to interpret study findings and, in some instances, to perform basic statistical calculations. The questions are similar to those that we have set in the past for exams: detailed model answers are provided in Part 5. As the longer structured questions may relate to information contained in several diverse chapters of the textbook, these do not follow any particular order, but in Appendix II, to aid readers who may wish to focus on specific topics in the

At a Glance

textbook, we provide a list of the structured questions and their related chapters.

Part

3

The ability to critically appraise the published literature is an essential skill that is required by anyone in the medical and dental professions (or, indeed, anyone involved in research more generally) and, consequently, is an important objective of a statistics course. Many aspects of statistics must be considered when evaluating the evidence provided in a research article, for example biases that might arise from inappropriate designs, sample size, outcome measures, the choice of statistical analysis, the presentation of the data and the conclusions drawn. Whilst

Medical Statistics at a Glance

presents a brief introduction to critical appraisal in Chapter 40 (Evidence-based medicine), Part 3 of our workbook supplements this by providing structured templates that can be used when reviewing and/or assessing the published literature. We suggest that the reader use these templates to critically appraise two published articles: a randomised controlled trial and an observational study. Our own evaluation of these articles is to be found in Part 5. Whilst we cannot hope to cover all possible topics within these two appraisals, we hope that they will at least provide a basic structure for appraisal that readers may find helpful.

Part

4

In our experience, one of the most common complaints from our students and junior research colleagues is that they just do not know where to start when analysing a substantial data set. To address this need, we have included in Part 4 a detailed description of the analyses of two data sets, the latter being available on the accompanying website (

www.medstatsaag.com

) as Excel files. Each analysis starts with a description of the clinical problem, and then takes the reader through the various steps that would be undertaken when performing the analysis, from the initial exploratory and descriptive analyses to the final sensitivity analyses that assess the robustness of study findings. We believe that this is an innovative approach and hope that readers will find it useful.

Part

5

This section of the workbook contains solutions to the MCQs in Part 1, model answers for the structured questions in Part 2, and our own critical appraisals of the randomised controlled trial (Paper 1) and the observational study (Paper 2) in Part 3. The pages in Part 5 are shaded so that the reader is easily able to navigate to the solutions and model answers.

Appendices

In Appendix I, we provide an ordered list of the MCQs and show which chapters they relate to, with an indication of the material included in each question. Appendix II is similar but identifies the associated chapters for the structured questions. For those readers who require exercises that relate to specific chapters of the

At a Glance

textbook, we provide a list of the chapters and indicate which multiple-choice and structured questions are relevant to them in Appendix III.

Further Information

In addition to the workbook, we remind readers that the companion website to Medical Statistics at a Glance (www.medstatsaag.com) also contains an extensive set of interactive exercises, with references to many published papers that may be of interest.

Acknowledgements

Special thanks are due to Drs Laura Silveira-Moriyama and Angus Pringle who very kindly lent us their data sets for the analyses in Part 4 of the workbook. We are most appreciative of the extremely helpful comments and suggestions that they made dur­ing the development of the analyses, but we take full responsibility for any errors or misconceptions in the final presentations. We are also indebted to the authors and publishers of the two papers that we used for critical appraisal for allowing us to reproduce the articles, thereby providing useful exercises for our readers, and apologise if any of our criticisms cause offence. We acknowledge the generosity of the many authors and publishers who have kindly assented to our adapting or reproducing material for the multiple-choice and structured questions, and are grateful to the publishing team at Wiley-Blackwell both for suggesting that we write this workbook and for their ideas and support along its route to publication. Our acknowledgements would not be complete without thanking our students over the years from whom we have learnt the art of teaching, and Mike, Gerald, Nina, Andrew and Karen for their forbearance, encouragement and good humour during our absorption with this manuscript.

Part 1: Multiple-Choice Questions

Handling Data

M1

To collect information on an individual’s ability to function physically, investigators identified six daily tasks, each relating to a different aspect of physical functioning. For every task, respondents were asked to say whether they generally experienced ‘no problems’ (allocated a score of 0), ‘some problems’ (score of 1) or ‘many problems’ (score of 2) when performing the task; by sum­ming the six individual scores, the investigators generated a total physical functioning score variable, which ranged from 0 to 12. Which one of the following statements is true?

a) The variable is best described as a continuous variable.
b) When capturing data on this score, only the final total score should be recorded on the data capture form.
c) Although this is strictly an ordinal categorical variable, for the purposes of analysis, it may be possible to treat this variable as a numerical variable.
d) The most suitable summary measure of the ‘average’ value for this variable would be the mode.
e) For the purposes of analysis, it would be preferable to re-categorise this final score into three categories: good functioning (scores of 0 to 4), average functioning (scores of 5 to 8) and poor functioning (scores of 9 to 12).

M2

Which one of the following statements is true?

a) A qualitative variable comprises two categories which may be ordinal or numerical.
b) An ordinal variable comprises categories which cannot be ordered.
c) The age groups ‘young’, ‘middle aged’ and ‘old’ relate to a nominal categorical variable.
d) Blood group is classified as a nominal categorical variable.
e) It may be difficult to distinguish a continuous numerical variable from an ordinal variable when the ordinal variable has many categories.

M3

As part of an epidemiological study investigating the association between consumption of dairy products in adolescence and the onset of cardiovascular disease later in life, study investigators plan to collect information on weekly egg consumption from a sample of children aged 14–17 years using self-administered questionnaires. Which one of the following would be the best approach for collecting this information?

a) Respondents are asked to indicate the number of eggs they consumed in the previous week and are asked to leave the entry blank if they do not know the answer.
b) Respondents are asked to tick the box that best describes the number of eggs they have consumed in the previous week: 0, 1–3, 4–7, >7 or ‘unknown’.
c) Respondents are asked to indicate the number of eggs they consumed in the previous week, and to record a value of 9 if they do not know the answer.
d) Respondents are asked to tick the box that best describes the number of eggs they consumed in the previous week: 0, 1–3, 4–7 or >7; if they do not know the answer, they are asked to leave the response blank.
e) Respondents are asked to indicate the number of eggs they consumed in the previous week, and to record a value of 999 if they do not know the answer.

M4

Which one of the following statements which relate to the information provided in a questionnaire is true?

a) Having data available as an ASCII file is inflexible because many people have not heard of ASCII.
b) A multi-coded question has more than two possible responses, but the respondent can provide only one answer to it.
c) Dates must be entered into a computer spreadsheet as day/month/year.
d) Missing data for a particular respondent must always be entered on the computer spreadsheet as 9, 99 or 999.
e) It is often necessary to assign numerical codes to a categorical variable before entering the data into the computer.

M5

The number of eggs consumed by an adolescent in a week was collected from a sample of 40 adolescents aged 14–17 years with a view to estimating average weekly egg consumption in such adolescents. Information on egg consumption was missing for two adolescents; the data from the remaining 38 subjects are as follows: 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 4, 5, 7, 7, 7, 8, 11, 14, 15, 21, 25, 27 and 71. Which one of the statements below is true?

a) The entry of 71 is an outlier but is likely to be a correct value of weekly egg consumption.
b) As it is unlikely that any individual will consume more than three eggs per day, the investigators should exclude any values greater than 21 before analysing the data.
c) As it is unlikely that any individual will consume more than three eggs per day, the investigators should replace any values >21 with the value 21 before analysing the data.
d) The authors suspect that the value of 71 is a typing error in the data set and so plan to replace this value with the value 9 before conducting any analyses.
e) The authors believe that the value of 71 is an error in the data set and so could consider running their analyses both including and excluding this outlying value.

M6

Which one of the following statements is true?

a) One approach to handling outliers in a data set is to analyse the data both with and without the outliers and see whether the results are similar.
b) It is never sensible to transform data to overcome the problem of a skewed distribution as the parameter estimates obtained from the transformed data cannot be interpreted.
c) Outliers in data should be omitted from the analysis as they may skew the results.
d) An outlier is an extreme value which is incompatible with the main body of the data and is always greater than all the other values in the data set.
e) The only ways of dealing with outliers in a data set are to analyse the data both with and without the outliers and determine the effect of the omission, to omit the outlier(s) from the analysis or to transform the data.

M7

Consider the data relating to the number of eggs consumed in a week described in Questions M3 and M5. Which one of the following diagrams would be best for displaying the information?

a) A bar chart
b) A histogram
c) A pie chart
d) A scatter diagram
e) A segmented bar chart

M8

Consider the data on the number of eggs consumed in a week described in Questions M3 and M5. Which one of the following best describes the distribution of this variable?

a) Skewed to the right
b) Normally distributed
c) Skewed to the left
d) Uniformly distributed
e) Negatively skewed

M9

Which one of the following statements is true?

a) A pie chart is one in which a circular ‘pie’ is split into sectors, one for each category of a categorical variable, so that the area of each sector is equal.
b) A sensible way of displaying continuous numerical data is to draw a bar chart.
c) A histogram is a chart in which separate vertical (or horizontal) bars are drawn with gaps between the bars; the width (height) of each bar relates to a specific range of values of the variable, and its height (width) is proportional to the associated frequency of observations.
d) The distribution of a variable is right skewed if a histogram of observed values has a long tail to the right with one or a few high values.
e) A box-and-whisker plot comprises a vertical or horizontal rectangle indicating the interquartile range, within which is the median; the ends of the ‘whiskers’ represent the upper and lower limits of the 95% confidence interval for the median.

M10

The authors of the egg consumption study (Questions M3 and M5) now wish to summarise the data on the number of eggs consumed in a week. Which one of the following approaches would be the best way to summarise these data?

a) The arithmetic mean and range
b) The median and interquartile range
c) The median and range
d) The arithmetic mean and standard deviation
e) The mode

M11

Which one of the following statements is true?

a) The median is greater than the arithmetic mean if the data are skewed to the right.
b) The median value of n observations is equal to the (n + 1)/2th value in the ordered set if n is odd.
c) The median and the weighted mean are always identical if the weights used in the calculation of the weighted mean are equal.
d) The logarithmic transformation of left-skewed data will often produce a symmetrical distribution when the transformed data are plotted on an arithmetic scale.
e) The geometric mean of a data set is equal to the arithmetic mean of the log-transformed data.

M12

Study investigators collected information on haemoglobin levels in a sample of 212 healthy women of mixed ethnicity. The investigators calculated the median value, and used the 2.5th and 97.5th percentile values to generate a reference range. Which one of the following statements is true?

a) The authors generated the reference range using the percentile approach as the number of subjects in their study was small.
b) Healthy individuals in the population will not have a value of haemoglobin that falls below the lower limit of the reference range.
c) Use of the mean and standard deviation to generate the reference range would have provided a more suitable reference range.
d) Individuals in the population with an underlying health condition that has an impact on haemoglobin levels will always have values that fall outside the reference range.
e) An individual in the population with an underlying health condition that has an impact on haemoglobin levels is likely to have a value that falls outside the reference range.

M13

When numerical data are arranged in order of magnitude, which one of the following statements is true?

a) The interquartile range is the difference between the first and fourth percentiles.
b) The interdecile range contains the central 80% of the ordered observations.
c) The middle observation is always equal to the arithmetic mean.
d) The 50th percentile is equal to the fifth quartile.
e) The first percentile is always equal to the minimum value.

M14

If a set of observations follow the Normal or Gaussian distribution, which one of the following statements is true?

a) Its mean and variance are equal.
b) Its observations are derived from healthy individuals.
c) Its mean and variance are always equal to zero and one, respectively.
d) 95% of the observations lie between the mean ± 1.96 times the variance.
e) Approximately 68% of the observations lie between the mean ± the standard deviation.

M15

Which one of the following statements is true?

a) A Binomial random variable is the count of the number of events that occur randomly and independently in time or space at some fixed average rate.
b) The two parameters that characterise a Poisson distribution are the number of individuals in the sample (or repetitions of a trial) and the true probability of success for each individual (or in each trial).
c) The Chi-squared distribution is based on a categorical random variable.
d) When the logarithm of observations which follow the Lognormal distribution are taken, the transformed observations follow the Normal distribution.
e) The Lognormal distribution is highly skewed to the left.

M16

The distribution of age at menopause tends to be skewed to the left. Study investigators wish to identify demographic and socioeconomic factors that are independently associated with age at menopause. Which one of the following statements relating to the analysis of age at menopause is true?

a) The optimal analytical approach is always to use a nonparametric method due to the skewness of the distribution.
b) Use of the logarithmic transformation would permit a parametric analysis based on the Normal distribution.
c) Use of the square transformation may help to achieve Normality.
d) By using a square transformation, we can ensure that the assumptions underlying a parametric analysis based on the Normal distribution are met.
e) The study investigators would be best advised to categorise age at menopause before performing the analysis.

M17

Which one of the following statements is true?

a) The logistic transformation linearises a sigmoid curve.
b) The logistic transformation is generally applied to counts which follow the Poisson distribution.
c) If a numerical variable, y, is skewed to the right, the distribution of z  y2 is often approximately Normal.
d) If a numerical variable, y, is skewed to the left, z  log y is often approximately Normally distributed.
e) The square transformation has properties which are similar to those of the logarithmic transformation.

Sampling and Estimation

M18

Which one of the following statements is true? The sampling distribution of the mean:

a) represents the mean of the distribution obtained by taking many repeated samples of a fixed size from the population of interest and plotting the observations so obtained;
b) has a mean which is an unbiased estimate of the true mean in the population;
c) will follow a Normal distribution only if the distribution of the original data is Normal;
d) has a standard deviation which is larger than the standard error of the mean; or
e) cannot be drawn if the sample size of the repeated samples is small.

M19

Study investigators have collected data on the heights of a sample of 137 women in Thailand. Which one of the following statements is true?

a) The true mean height in the Thai female population will be equal to the mean height of the women in the sample.
b) If the investigators were to calculate the range of values determined by the mean height ± 1.96 × standard deviation, they would be able to assess from this range of values the precision of the estimated mean height in their sample.
c) To enable other research groups to compare the distribution of the height values in their own studies to those from the investigators’ study, the investigators should calculate and present the median height and its associated confidence interval.
d) If the heights are approximately Normally distributed, the authors may calculate and present the mean height and its standard deviation. This will allow them to describe the distribution of height values in their sample.
e) By calculating the confidence interval for the mean, the investigators will be able to determine whether the height values in their sample are Normally distributed.

M20

Jensen et al. (2011) conducted a retrospective cohort study to assess the incidence of wound complications among patients undergoing lower-limb arthroplasty, before and after a change in clinical practice from the use of low-molecular-weight heparin to rivaroxaban. Prior to the switch to rivaroxaban, 9 of 489 patients (1.8%, 95% confidence interval 0.9 to 3.5%) returned to theatre with wound complications within 30 days compared to 22 of the 559 patients (3.9%, 95% confidence interval 2.6 to 5.9%) who received rivaroxaban. Which one of the following statements is true?

a) The confidence interval for the wound complication rate prior to the switch to rivaroxaban is asymmetrical, indicating that the outcome is not Normally distributed.
b) The true percentage of wound complications prior to the switch to rivaroxaban lies between 0.9% and 3.5%.
c) The 95% confidence intervals for the two periods overlap, indicating that there was no significant change in the wound complication rate after the switch to rivaroxaban.
d) Had the number of wound complications been greater in each period, the confidence intervals would have been wider.
e) Had the number of patients in each period been greater, the confidence intervals would have been narrower.

Jensen CD, Steval A, Partington PF, Reed MR, Muller SD. Return to theatre following total hip and knee replacement, before and after the introduction of rivaroxaban: a retrospective cohort study. J Bone Joint Surg Br 2011; 93: 91–5.

M21

Which of the following statements is true for a sample of size n > 1?

a) The 99% confidence interval for the mean is narrower than the 95% confidence interval for the mean.
b) The 95% confidence interval for the mean of a particular variable is narrower than the reference interval for that variable.
c) If the true standard deviation is known, the 95% confidence interval for the mean is calculated as the mean ± 1.96 times the standard deviation.
d) The 95% confidence interval for the mean represents the interval within which the sample mean falls with 95% certainty.
e) The 95% confidence interval for the mean represents the interval which contains the central 95% of the observations in the population.

Study Design

M22

Which one of the following studies would be best described as a cohort study?

a) A study in which cells are stimulated with three different types of growth inducing protein.
b) A study of medical students who are followed from entering medical school to the end of their first year to describe the associations between lifestyle factors and end-of-first-year exam results.
c) A study of medical students who are split by the study investigators into two groups: those with surnames beginning with the letters A to M received regular counselling support over the first year, and those with surnames beginning with the letters O to Z did not. The outcome of the study was the proportion of students who passed their end-of-first-year exams, and this proportion was to be compared between the two groups.
d) Medical students who fail their end-of-first-year exams are interviewed about their lifestyles over the first year; a random sample of students who passed their end-of-first-year exams are also interviewed, and the results compared to assess the effects of lifestyle factors on failing the end-of-first-year exams.
e) Study investigators compared end-of-first-year exam pass rates at 10 different medical schools in the United Kingdom, and correlated these pass rates with the number of bars and nightclubs in the vicinity of each medical school.

M23

In medical research we are often interested in determining whether exposure to a factor causes an effect (e.g. a disease). Which one of the following criteria is a necessary component for assessing the cause of disease?

a) The cause and effect must take place simultaneously.
b) The association between cause and effect can be assessed on the basis of statistical results alone, independently of biological reasoning.
c) If feasible, removing the potential causative factor of interest should reduce the risk of disease.
d) The effect cannot be causal if the association between the cause and effect is small.
e) It is usually sufficient to imply causation on the basis of the results from a single study, provided the association between the cause and effect in that study is strong.

M24

Investigators conducted a randomised cross-over study to compare two appliances for the prevention of snoring. Every trial participant used each appliance for a period of one month, with a 2-week washout period between the two study periods. Which one of the following statements is true?

a) The investigators chose a cross-over design as the appliances are likely to have a long-term impact on snoring symptoms.
b) By using a cross-over design, the investigators were able to shorten the length of treatment time that was required.
c) Because they had used a cross-over design rather than a parallel group study design, the investigators had to increase the size of their sample.
d) A 2-week washout period was incorporated to allow the trial participants sufficient time to clean and return their appliances.
e) By choosing a cross-over design, the investigators were able to use each participant as his or her own control, thus reducing variability.

M25

In randomised trials of new human immunodeficiency virus therapies, investigators may use a composite endpoint known as the time to loss of virological response. Patients are deemed to meet the endpoint after the first of a series of events occurs: a new acquired immunodeficiency syndrome event, death, the patient is lost to follow-up or the patient experiences virological failure on treatment. At that point, the patient exits the trial and follow-up ceases on the patient. Which one of the following statements is true?

a) Investigators use a composite endpoint as they cannot make a decision in advance about which is the most important outcome.
b) Composite endpoints simplify the analysis of randomised trials.
c) If one or more components of the composite endpoint are deemed to have greater clinical relevance than others, then appropriate analytical methods which take this into consideration must be used when analysing a trial that utilises a composite endpoint.
d) A study that uses such a composite endpoint can provide reliable information about the frequency of occurrence of each component of the composite; thus, this type of trial provides good value for money.
e) If a composite endpoint is used instead of basing the analysis on each component of the composite, the length of the trial must be increased.

M26

Which one of the following statements is true?

a) A factorial design is one in which there is a single factor of interest.
b) A statistical interaction exists in a clinical trial when one or more of the treatments produce side effects.
c) The cross-over trial in a clinical setting is an example of a between-individual comparison.
d) A parallel trial comparing two treatments is one in which each individual receives both treatments in parallel.
e) A complete randomised design is one in which the experimental units are assigned randomly to the treatments and there are no other refinements to the design.

M27

Study investigators wish to perform a cluster randomised trial to evaluate the effectiveness of an education programme aimed at the parents of primary school children to increase the appropriate use, for this age group, of child restraints in cars. In the context of this study, ‘parents’ refers to the mother and/or father of a child, as appropriate. Rather than recruiting individual parents to the trial, the trial plans to recruit 32 primary schools with the intervention being applied at the school level (via meetings of groups of parents of children attending that school) – each school will be randomly assigned to receive the intervention or not. Which one of the following statements is true?

a) A cluster randomised design was chosen to reduce the size of the trial.
b) A cluster randomised design was chosen as it was possible to individually randomise the parents of each child to the intervention.
c) A cluster randomised design was chosen as it was felt that it would be impossible to treat the parents of each child in a school as an independent unit of investigation in the trial.
d) The unit of investigation for the trial is the individual adult driving each car.
e) The sample size for this trial is the same as it would have been had the investigators made the decision to use a nonclustered design.

M28

Which one of the following statements is true about clinical trials?

a) If a trial has secondary endpoints, there are two endpoints that are of primary interest.
b) Randomisation of individuals to patients is a process devised to avoid assessment bias.
c) A sequential trial is an extension of a cross-over trial when there are more than two treatments to be compared and each patient receives the treatments sequentially.
d) Blocked randomisation is used so as to achieve approximately equally sized groups at the end of patient recruitment.
e) Systematic allocation is a method of allocating individuals to treatments using a list of random numbers that has been created in a systematic way.

M29

Study investigators initiated a cohort study to determine the association between retirement and the incidence of depression within 5 years of retirement. The investigators recruited 1000 participants who were at the point of retirement; participants were then followed over a 5-year period with annual questionnaires sent to them to obtain information on self-reported depressive symptoms. Which one of the following statements is true?

a) By setting up a cohort in this way, the study investigators will be able to quantify the effect of retirement on the incidence of depression in the population within 5 years of retirement.
b) The investigators should restrict their analyses to the subgroup of participants who remain under follow-up for the full 5-year period.
c) The primary outcome measure of this study will be the prevalence of depression at 5 years.
d) To ensure that retirement precedes any symptoms of depression, the study investigators should exclude from their calculations of the incidence of depression any individual who already has depressive symptoms at recruitment.
e) This study is prone to recall bias as the information on depressive symptoms is obtained via self-report.

M30

Which one of the following statements is true about a cohort study?

a) The time sequence of events cannot be assessed.
b) It can provide information on a wide range of disease outcomes.
c) It is difficult to study exposure to factors that are rare.
d) The risk of disease cannot be measured directly.
e) It is generally cheap to perform.

M31

Kik et al. (2011) conducted an unmatched case–control study to investigate the extent to which travel to tuberculosis (TB)-endemic countries contributes to TB incidence among immigrants from Morocco living in the Netherlands. Cases were those of Moroccan background who had been diagnosed with TB in 2006–7 and had been seen at one of 17 municipal health services in the Netherlands; controls were a retrospective sample from the Survey on Integration of Minorities 2006 who did not have TB, had also been born in Morocco and were living in the Netherlands. Of the 32 cases with TB, 26 (81%) had travelled in the preceding year compared to 472 (58%) of the 816 controls. Which one of the following statements is true?

a) The risk of TB among Moroccan immigrants in the Netherlands is 32/816 (3.9%).
b) The risk of TB among Moroccan immigrants in the Netherlands is 32/848 (3.7%).
c) The odds ratio of TB is 1.40 (i.e. 81/58), suggesting that Moroccan immigrants in the Netherlands who travel to TB-endemic countries are 40% more likely to experience TB than those who do not travel to the TB-endemic countries.
d) The authors would be advised to report the relative risk from their study, as this would more accurately capture the increased risk of TB in Moroccan immigrants in the Netherlands associated with travel to TB-endemic countries than the odds ratio.
e) The odds ratio of TB is 3.16, suggesting that Moroccan immigrants in the Netherlands who travel to TB-endemic countries are over three times more likely to experience TB than those who do not travel to TB-endemic countries.

Kik SV, Mensen M, Beltman M, et al. Risk of travelling to the country of origin for tuberculosis among immigrants living in a low-incidence country. Int J Tuberc Lung Dis 2011; 15: 38–43.

M32

Which one of the following statements is true about a case–control study?

a) It is particularly suitable for rare diseases.
b) Loss to follow-up is a common problem.
c) The relative risk is commonly used to estimate the effect of the exposure on the disease outcome.
d) It is an example of a prospective observational study.
e) It is an example of an experimental study because those with disease are compared to those without it.

M33

In a recent matched case–control study, 200 cases with hepatocellular carcinoma were individually matched to 200 controls without hepatocellular carcinoma by sex and age (±5 years). The investigators collected information, for each subject, on a number of potential risk factors and were interested in determining which of them was associated with hepatocellular carcinoma. Which one of the following statements is true?

a) The authors should use conditional logistic regression methods to analyse the outcomes from this study.
b) The study investigators decided to match cases and controls by age and sex as they were particularly interested in the asso­ciations between each of these variables and hepatocellular carcinoma.
c) When calculating the odds ratios, the study investigators should ignore the fact that the cases and controls are matched by age and sex.
d) Had the authors loosened their matching criteria to ensure that cases and controls were matched by age within 10 rather than 5 years, the results from the study would have been strengthened.
e) The authors should use multiple linear regression methods to analyse the outcomes from this study.

Hypothesis Testing

M34

Which one of the following statements is true?

a) If the hypothesised value for the effect of interest (e.g. the difference in means) in a hypothesis test lies within the 95% confidence interval for the effect, then we have evidence to reject the hypothesis, P < 0.05.
b) A hypothesis test of superiority which proceeds by calculating a test statistic and relating it to the appropriate probability distribution to obtain the P-value is so called because is it superior to testing the hypothesis using the relevant confidence interval.
c) The test statistic that is calculated in a hypothesis testing procedure reflects the amount of evidence in the data against the null hypothesis.
d) A bioequivalence trial is a particular type of randomised trial which is concerned with demonstrating that biological treatments have the same effect as nonbiological treatments on a disease outcome.
e) Nonparametric tests lead to an appreciation of the data, rather than focusing on decisions, because they do not concentrate on the parameters of the underlying distributions.

M35

Ogawa et al. (2010) conducted a study of 21 elderly women who participated in 12 weeks of resistance exercise training. The investigators measured muscle thickness and circulating levels of C-reactive protein, serum amyloid A, heat shock protein 70, tumour necrosis factor-α, interleukin-1, interleukin-6, monocyte chemotactic protein, insulin, insulin-like growth factor and vascular endothelial growth factor before and after the 12 weeks of training. Whilst training significantly reduced levels of five of the variables listed (P < 0.05), these reductions were not statistically significant after applying the Bonferroni correction. Which one of the following statements is true?

a) The authors used a Bonferroni correction to adjust their P-values as they did not believe that their initial findings were true.
b) The authors used a Bonferroni correction to adjust their P-values as they were performing subgroup analyses on the data set.
c) The authors used a Bonferroni correction to adjust their P-values as they were making multiple comparisons for a single outcome variable.
d) The authors used a Bonferroni correction to adjust their P-values and then applied a more stringent threshold than the conventional 0.05 for statistical significance.
e) The authors used a Bonferroni correction to adjust their P-values as they had multiple outcome variables.

Ogawa K, Sanada K, Machida S, Okutsu M, Suzuki K. Resistance exercise training-induced muscle hypertrophy was associated with reduction of inflammatory markers in elderly women. Mediators Inflamm 2010; 2010: 171023.

M36

Which one of the following statements is true about the probability of making a Type I error when performing a single hypothesis test?

a) It is equal to one minus the probability of a Type II error.
b) It is the probability of rejecting the null hypothesis when it is true.
c) It is the probability of not rejecting the null hypothesis when it is false.
d) It can never exceed 0.05.
e) It is equal to the significance level of the hypothesis test.

Basic Techniques for Analysing Data

Numerical Data

M37

McCorkle et al. (2010) measured spleen length in 66 tall athletes (defined as being at least 6 feet 2 inches tall for men and 5 feet 7 inches tall for women). Measurements of spleen size, obtained from an ultrasound examination, were compared to values from normal-sized individuals (not necessarily athletes) from the same population obtained from the published literature. The authors calculated the mean, standard deviation and variance of spleen length from their sample, and then conducted a one-sample t-test to determine whether the spleen length of tall athletes differed from that of the normal-sized individuals. Tall athletes had a mean spleen length of 12.19 cm (95% confidence interval 11.84, 12.55 cm), whereas the population mean spleen length was 8.94 cm. Which one of the following statements is true?

a) The investigators chose to perform a one-sample t-test as they believed that the distribution of spleen length in the population was skewed.
b) The investigators chose to perform a one-sample t-test as their study included both male and female athletes.
c) From the information provided, it is likely that measurements of spleen length are highly skewed in the population.
d) As the authors included both male and female athletes of different sizes, the conclusions of the study must be unreliable.
e) The result of the one-sample t-test is statistically significant (P < 0.05) indicating that the mean spleen length of tall athletes was significantly greater than that of normal-sized individuals in the population.

McCorkle R, Thomas B, Suffaletto H, Jehle D. Normative spleen size in tall healthy athletes: implications for safe return to contact sports after infectious mononucleosis. Clin J Sport Med 2010; 20: 413–5.

M38

Which one of the following statements is true?

a) The one-sample t-test is used to test the null hypothesis that the sample mean takes a particular value.
b) The assumption underlying the one-sample t-test for small samples is that the variable of interest follows the t-distribution with degrees of freedom equal to the sample size minus one.
c) The sign test used on numerical data tests the null hypothesis that the population median takes a particular value.
d) The sign test used on numerical data evaluates the number of values in the sample that are greater (or less) than the median value specified in the null hypothesis and assesses whether it differs significantly from n′/2, where n′ is the number of observations in the sample not equal to the specified median.
e) The sign test and one-sample t-test performed on the same set of numerical data will give exactly the same P-value if the data are Normally distributed.

M39

To investigate the value of information provided to patients at discharge following open-heart surgery, Ozcan et al. (2010) recruited 50 patients who underwent open-heart surgery from January to June 2007. At the time of discharge, all patients completed a pre-test questionnaire consisting of 34 questions that assessed their degree of knowledge relating to preventing re-hospitalisation, increasing self-care ability, gaining self-sufficiency and preventing complications post-surgery. Patients then underwent a training session and were provided with a booklet to take away with them. When patients attended for their routine medical follow-up after a one-month period, they completed the same questionnaire. The mean number of correct answers pre-intervention was 0.86 (standard deviation 1.28); this increased to 27.88 (3.84) after the intervention. Results of a Wilcoxon signed-ranks test suggested that the difference between the number of correct answers pre- and post-intervention was statistically significant (P < 0.05). Which one of the following statements is true?

a) It would have been preferable to use a two-sample t-test to analyse these data.
b) To gain a better understanding of the effect of the intervention, it would have been preferable to quote the median or mean change over time in the number of correctly answered questions (and the range or standard deviation of this, as appropriate), rather than the mean (standard deviation) number of correctly answered questions at each time point.
c) The Wilcoxon signed-ranks test is ideally suited to the situation where there are two independent data sets.
d) The result of the Wilcoxon signed-ranks test suggests that the apparent improvement in knowledge is likely to be a chance finding.
e) As there was a significant increase in knowledge level over the one-month period, the authors can conclude that their intervention was successful.

Ozcan H, Findik UY, Sut N. Information level of patients in discharge training given by nurses following open heart surgery. Int J Nurs Pract 2010; 16: 289–94.

M40

Which one of the following statements is true?

a) The sign test cannot be used on numerical data.
b) The Wilcoxon signed-ranks test is a more powerful test than the sign test when there are paired numerical observations.
c) The paired t-test is a nonparametric alternative to the Wilcoxon signed-ranks test.
d) The two-sample t-test produces the same P-value as the paired t-test when there are two groups of paired numerical observations.
e) The assumption underlying the paired t-test is that the variance of the observations is the same in each of the two groups.

M41

As part of a cross-sectional study to investigate risk factors for postpartum depression during the first postpartum year in women of low socioeconomic status, Yagmur and Ulukoca (2010) interviewed 785 women from Malatya in eastern Turkey. Data on depression were collected using the Edinburgh Postnatal Depression Scale (EPDS) which provides a value on a scale from 0 to 30 (higher scores indicate greater psychological distress), and through the Multidimensional Scale of Perceived Social Support (MSPSS) which provides a score that can range from 12 to 84 (higher scores indicate greater social support). Data were analysed using t-tests, one-way analysis of variance and logistic regression. Which one of the following statements is true?

a) Provided the assumptions of Normality and constant variance were satisfied, it would be appropriate to perform an unpaired t-test to investigate a possible association between the EPDS score and age group (categorised as ≤20, 21–30 and ≥31 years).
b) Provided the assumptions of Normality and constant variance were satisfied, it would be appropriate to perform a one-way analysis of variance to investigate a possible association between the EPDS score and age group (categorised as ≤20, 21–30 and ≥31 years).
c) As the range of values of the MSPSS scale is greater than that of the EPDS, it would be appropriate to use analysis of variance to assess the association between the two scores.
d) The investigators considered whether there was a difference in mean EPDS scores between those women who received health insurance and those who did not. The null hypothesis for this comparison was that those who received health insurance have a lower mean EPDS score than those who did not receive health insurance.
e) As each woman had a value of both the EPDS and the MSPSS scores, it would be appropriate to use a paired t-test to investigate whether there was an association between the two scores.

Yagmur Y, Ulukoca N. Social support and postpartum depression in low-socioeconomic level postpartum women in eastern Turkey. Int J Pub Health 2010; 55: 5439–49.

M42

Which one of the following statements is true?

a) The Wilcoxon signed-ranks test is a nonparametric alternative to the unpaired t-test.
b) The Wilcoxon signed-ranks test produces the same P