How to Display Data - Jenny V. Freeman - E-Book

How to Display Data E-Book

Jenny V. Freeman

0,0
36,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Effective data presentation is an essential skill for anybody wishing to display or publish research results, but when done badly, it can convey a misleading or confusing message. This new addition to the popular “How to” series explains how to present data in journal articles, grant applications or research presentations clearly, accurately and logically, increasing the chances of successful publication.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 141

Veröffentlichungsjahr: 2011

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Contents

Copyright

Preface

Chapter 1 Introduction to data display

1.1 Introduction

1.2 Types of data

1.3 Where to start?

1.4 Recommendations for the presentation of numbers

1.5 Recommendations for presenting data and results in tables

1.6 Recommendations for construction of graphs

1.7 Table or graph?

1.8 Software

Summary

References

Chapter 2 How to display data badly

2.1 Introduction

2.2 Amount of information

2.3 Suppress the origin or change the baseline

2.4 Don’t order the data by value

2.5 Use images to show linear contrasts

Summary

References

Chapter 3 Displaying univariate categorical data

3.1 Describing categorical data

3.2 Pie charts

3.3 Bar charts

3.4 Two- or three-dimensional charts?

3.5 Clustered bar charts

3.6 Stacked bar charts

Summary of the main points when displaying categorical data

References

Chapter 4 Displaying quantitative data

4.1 Count data

4.2 Graphs for continuous data

4.3 Dotplots

4.4 Stem and leaf plots

4.5 Histograms

4.6 Box–whisker plots

Summary

References

Chapter 5 Displaying the relationship between two continuous variables

5.1 Introduction

5.2 Correlation

5.3 Regression

5.4 Lowess smoothing plots

5.5 Assessing agreement between two continuous variables

5.6 Bland–Altman plots

5.7 ROC curves for diagnostic tests

5.8 Analysis of ROC curves

Summary

References

Chapter 6 Data in tables

6.1 Presenting data and results in tables

6.2 Tables for categorical outcome data

6.3 Tables for continuous outcomes

6.4 Tables for multiple outcome measures

Summary

References

Chapter 7 Reporting study results

7.1 Introduction

7.2 Tabulating categorical outcomes

7.3 Tabulating the results of logistic regression analysis

7.4 Tabulating quantitative outcomes

7.5 Plots for displaying outcome data

7.6 Tabulating the results of regression analyses

7.7 Reporting results for repeated measures data

7.8 Randomised controlled trials

7.9 Patient flow diagram

7.10 Comparison of entry characteristics

7.11 Forest plots

7.12 Funnel plots

7.13 Summary

References

Appendix

Chapter 8 Time series plots and survival curves

8.1 Introduction

8.2 Time series plots

8.3 Lowess smoothing plots

8.4 Survival

Summary

References

Chapter 9 Displaying results in presentations

9.1 Introduction

9.2 Graphic design of slides

9.3 Text

9.4 Pictures/graphics: including the use of graphics and clip art

9.5 Colour

9.6 Space

9.7 Summary slides

9.8 Conclusion

Summary

Index

© 2008 Jenny V. Freeman, Stephen J. Walters, Michael J. Campbell

Published by Blackwell Publishing

BMJ Books is an imprint of the BMJ Publishing Group Limited, used under licence

Blackwell Publishing, Inc., 350 Main Street, Malden, Massachusetts 02148-5020, USA

Blackwell Publishing Ltd, 9600 Garsington Road, Oxford OX4 2DQ, UK

Blackwell Publishing Asia Pty Ltd, 550 Swanston Street, Carlton, Victoria 3053, Australia

The right of the Author to be identified as the Author of this Work has been asserted in accordance with the Copyright, Designs and Patents Act 1988.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.

First published 2008

1 2008

Library of Congress Cataloging-in-Publication Data

Freeman, Jenny.

How to display data / Jenny Freeman, Stephen J. Walters, Michael J. Campbell.

p. ; cm.

ISBN 978-1-4051-3974-8 (pbk. : alk. paper)

1. Medical writing. 2. Medical statistics. 3. Medicine–Research–Statistical methods. I. Walters, Stephen John. II. Campbell, Michael J., PhD. III. Title. [DNLM: 1. Research Design. 2. Data Display. 3. Data Interpretation, Statistical. 4. Statistics. W 20.5 F869h 2007]

R119.F76 2007

610.72′7–dc22

2007032641

ISBN: 978-1-4051-3974-8

A catalogue record for this title is available from the British Library

Set by Charon Tec Ltd (A Macmillan Company), Chennai, India

Printed and bound in Singapore by Utopia Press Pte Ltd

Commissioning Editor: Mary Banks

Editorial Assistant: Victoria Pittman

Development Editor: Simone Dudziak

Production Controller: Rachel Edwards

For further information on Blackwell Publishing, visit our website:

http://www.blackwellpublishing.com

The publisher’s policy is to use permanent paper from mills that operate a sustainable forestry policy, and which has been manufactured from pulp processed using acid-free and elementary chlorine-free practices. Furthermore, the publisher ensures that the text paper and cover board used have met acceptable environmental accreditation standards.

Blackwell Publishing makes no representation, express or implied, that the drug dosages in this book are correct. Readers must therefore always check that any product mentioned in this publication is used in accordance with the prescribing information prepared by the manufacturers. The author and the publishers do not accept responsibility or legal liability for any errors in the text or for the misuse or misapplication of material in this book.

Preface

The best method to convey a message from a piece of research in health is via a figure. The best advice that a statistician can give a researcher is to first plot the data. Despite this, conventional statistics textbooks give only brief details on how to draw figures and display data. The purpose of this book is to give advice on the best methods to display data which have arisen from a variety of different sources. We have tried to make the book concise and easy to read. By displaying data badly one can very easily give misleading messages (or hide inconvenient truths) and we try to highlight how consumers of data have to be aware of these problems. We have also included advice on displaying data for posters and talks.

Researchers who want to display the results of their studies in figures or tables particularly for publication in a journal will find this book useful. Readers of the research literature, who wish to critically appraise a piece of work will find useful tips on interpreting figures that they encounter. People who have to deliver a talk or a conference presentation should also find good advice on displaying their results.

We would like to thank Mary Banks and Simone Dudziak from Blackwell for their patience and advice.

Jenny V. Freeman

Stephen J. Walters

Michael J. Campbell

Medical Statistics Group, ScHARR, Sheffield

June 2007

Chapter 1

Introduction to data display

1.1 Introduction

This book has arisen from our extensive experience as researchers and teachers of medical statistics. We have frequently been appalled by the poor quality of data display even in major medical journals. While there is already a wealth of information about how to display data, it is scattered across many sources. Our purpose in writing this book is to bring together this information into a single volume and provide clear accessible advice for both researchers, and students alike.

Well-displayed data can clearly illuminate and enhance the interpretation of a study, while badly laid out data and results can obscure the message or at worst seriously mislead. Although the appropriate display of data in tables and graphs is an essential part of any report, paper or presentation, little space is devoted to it in the majority of textbooks. The purpose of this book is to address this deficit and give clear guidelines on appropriate methods for displaying quantitative information, using both graphs and tables.

There are many different types of graph and table available for displaying data; their purposes will be outlined in subsequent chapters. This chapter will outline the reasons why it is important to get display right, good principles to adhere to when displaying data and the types of data that will be covered in the rest of the book. The second chapter will cover some of the many ways in which the display of information can be badly done and the following chapters will then unpick these, and give clear guidance on how to do it well.

1.2 Types of data

To display data appropriately, one must first understand what types of data there are, as this determines the best method of displaying them. Figure 1.1 shows a basic hierarchy of data types, although there are others. Data are either categorical or quantitative. Data are described as categorical when they can be categorised into distinct groups, such as ethnic group or disease severity. Although categorical data may be coded numerically, for example gender may be coded 1 for male and 2 for female, these codes have no intrinsic numerical value; it would be nonsense to calculate an average gender. Categorical data can be divided into either nominal or ordinal. Nominal data have no natural ordering and examples include eye colour, marital status and area of residence. Binary data is a special subcategory of nominal data, where there are only two possible values, for example male/female, yes/no, dead/alive. Ordinal data occurs when there can be said to be a natural ordering of the data values, such as better/same/worse, grades of breast cancer and social class.

Figure 1.1 Types of data.

Quantitative data can be either counted or continuous. Count data are also known as discrete data and, as the name implies, occur when the data can be counted, such as the number of children in a family or the number of visits to a GP in a year. Count data are similar to categorical data as they can only take discrete whole numbers. Continuous data are data that can be measured and they can take any value on the scale on which they are measured; they are limited only by the scale of measurement and examples include height, weight and blood pressure.

1.3 Where to start?

When displaying information visually, there are three questions one will find useful to ask as a starting point (Box 1.1). Firstly and most importantly, it is vital to have a clear idea about what is to be displayed; for example, is it important to demonstrate that two sets of data have different distributions or

Box 1.1 Useful questions to ask when considering how to display information

What do you want to show?What methods are available for this?Is the method chosen the best? Would another have been better?

that they have different mean values? Having decided what the main message is, the next step is to examine the methods available and to select an appropriate one. Finally, once the chart or table has been constructed, it is worth reflecting upon whether what has been produced truly reflects the intended message. If not, then refine the display until satisfied; for example if a chart has been used would a table have been better or vice versa? This book will help you answer these questions and provide you with the means to best display your data.

1.4 Recommendations for the presentation of numbers

When summarising categorical data, both frequencies and percentages can be used. However, if percentages are reported, it is important that the denominator (i.e. total number of observations) is given. To summarise continuous numerical data, one should use the mean and standard deviation, or if the data have a skewed distribution use the median and range or interquartile range. However, for all of these calculated quantities it is important to state the total number of observations on which they are based.

In the majority of cases it is reasonable to treat count data, such as number of children in a family or number of visits to the GP in a year, as if they were continuous, at least as far as the statistical analysis goes. Ideally there should be a large number of different possible values, but in practice this is not always necessary. However, where ordered categories are numbered, such as stage of disease or social class, the temptation to treat these numbers as statistically meaningful must be resisted. For example, it is not sensible to calculate the average social class of a sample or stage of cancer for a group of patients, and in such cases the data should be treated in statistical analyses as if they are ordered categories.1

Numerical precision should be consistent throughout and summary statistics such as means and standard deviations should not have more than one extra decimal place (or significant digit) compared to the raw data. Spurious precision should be avoided although when certain measures are to be used for further calculations or when presenting the results of analyses, greater precision may sometimes be appropriate.2

1.5 Recommendations for presenting data and results in tables

There are a few basic rules of good presentation, both within the text of a document or presentation, and within tables, as outlined in Box 1.2. Tufte, in 1983, outlined a fundamental principle: always try to get as much information into a figure consistent with legibility. In other words, one should maximise the ratio of the amount of information given to the amount of ink used.3 Tables, including column and row headings, should be clearly labelled and a brief summary of the contents of a table should always be given in words, either as part of the title or in the main body of the text.

Box 1.2 Recommendations when presenting data and results in tables

The amount of information should be maximised for the minimum amount of ink.Numerical precision should be consistent throughout a paper or presentation, as far as possible.Avoid spurious accuracy. Numbers should be rounded to two effective digits.Quantitative data should be summarised using either the mean and standard deviation (for symmetrically distributed data) or the median and interquartile range or range (for skewed data). The number of observations on which these summary measures are based should be included.Categorical data should be summarised as frequencies and percentages. As with quantitative data, the number of observations should be included.Each table should have a title explaining what is being displayed and columns and rows should be clearly labelled.Solid lines in tables should be kept to a minimum.Where variables have no natural ordering, rows and columns should be ordered by size.

Solid lines should not be used in a table except to separate labels and summary measures from the main body of the data. However, their use should be kept to a minimum, particularly vertical gridlines, as they can interrupt eye movements, and thus the flow of information. White space can be used to separate data, such as different variables, from each other.4

The information in tables is easier to comprehend if the columns (rather than the rows) contain similar information, such as means or standard deviations, as it is easier to scan down a column than across a row.4 However, it is not always easy to do this, particularly when the information for several variables is contained in the same table and comparisons are to be made between different groups. This will be covered in more detail in Chapter 6. In addition, where there is no natural ordering of the rows (or indeed columns), they should be ordered by size (category with the highest frequency first, lowest frequency last) as this helps the reader to scan for patterns and exceptions in the data.4Table 1.1a shows the frequency distribution for marital status for 226 patients with leg ulcers who were recruited to a study to assess the effectiveness of specialist leg ulcers clinics compared to usual care.5 The categories in this table are ordered alphabetically, whereas in Table 1.1b the marital status categories are ordered by frequency making it much easier to interpret than Table 1.1a.

Table 1.1 Marital status of 226 patients with leg ulcer recruited to a study to assess the effectiveness of specialist leg ulcer clinics using 4-layer compression bandaging compared to usual care5

Frequency Percent (a) Unordered rows Divorced/separated 11 4.9 Married 104 46.0 Single 25 11.1 Widowed 86 38.1 Total 226 100.0 (b) Ordered rows Married 104 46.0 Widowed 86 38.1 Single 25 11.1 Divorced/separated 11 4.9 Total 226 100

1.6 Recommendations for construction of graphs

Box 1.3 outlines some basic recommendations for the construction and use of figures to display data. As with tables, a fundamental principle is that graphs should maximise the amount of information presented for the minimum amount of ink used.3 Good graphs have the following four features in common: clarity of message, simplicity of design, clarity of text, and integrity of intention and action.6 A graph should have a title explaining what is displayed and axes should be clearly labelled; if it is not immediately

Box 1.3 Guidelines for constructing graphs

The amount of information should be maximised for the minimum amount of ink.Each graph should have a title explaining what is being displayed.Axes should be clearly labelled.Gridlines should be kept to a minimum.Avoid three-dimensional graphs as these can be difficult to read.The number of observations should be included.

obvious how many individuals the graph is based upon, this should also be stated. Gridlines should be kept to a minimum as they act as a distraction and can interrupt the flow of information. When using graphs for presentation purposes care must be taken to ensure that they are not misleading; an excellent exposition of the ways in which graphs can be used to mislead can be found in Huff.7Figure 1.2 shows a bar chart of the marital status data from Table 1.1 displayed using these principles. It includes a clear title (with the sample size), labelled axes, no gridlines and the marital status categories are ordered by their frequency.

Figure 1.2 Bar chart of marital status for 226 patients recruited to the leg ulcer Study.5

1.7 Table or graph?