54,99 €
In the era of Evidence Based Medicine, health professionals are required to fully understand design, analysis and interpretation of the results of research. Furthermore, they should be able to assess the needs of their communities and respond accordingly. To achieve these goals, clinicians need to be familiar with the basic concepts of epidemiology and biostatistics. But epidemiology is more than “the study of.” Its application and practice are essential to address public health issues. That is why this book provides not only the theory, but also the opportunity of applying it in practice. In fact, each chapter presents one or more specific examples on how to perform an epidemiological or statistical data analysis and includes download access to the software and databases, giving the reader the possibility of replicating the analyses described. The final purpose is, therefore, to introduce epidemiologic and biostatistical methods as applied to clinical research, and to develop proficiency with computer software for performing the analysis of clinical datasets.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Veröffentlichungsjahr: 2010
Applied Epidemiology and Biostatistics
Giuseppe La Torre
© SEEd srl
Piazza Carlo Emanuele II, 19 – 10123 Torino – Italy
Tel. +39.011.566.02.58 – Fax +39.011.518.68.92
www.edizioniseed.it – [email protected]
First edition
September 2010
ISBN 978-88-8968-856-4
Although the information about medication given in this book has been carefully checked, the author and publisher accept no liability for the accuracy of this information. In every individual case the user must check such information by consulting the relevant literature.
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the Italian Copyright Law in its current version, and permission for use must always be obtained from SEEd Medical Publishers srl. Violations are liable to prosecution under the Italian Copyright Law.
When this book was conceived, as a discussion among members of the section of Public Health Epidemiology of the European Public Health Associations (EUPHA), the main idea was to describe not only theory, but above all how to use the available software for epidemiologic and statistical data analysis.
In the era of Evidence Based Medicine, health professionals are required to fully understand design, analysis and interpretation of the results of research. Furthermore, they should be able to assess the needs of their communities and respond accordingly. To achieve these goals, one needs to be familiar with the basic concepts of epidemiology and biostatistics.
But epidemiology is more than “the study of.” Its application and practice are essential to address public health issues.
So, the purpose of the book is to give the reader either the theory concerning specific aspects of technical disciplines as epidemiology and biostatistics, and in the mean time to give the opportunity of replicating under guidance the analysis done by each chapter’s author and already published in a given research article. The idea is to use the available software for epidemiologic and statistical data analysis, that each reader can download freely from the Internet.
Concerning the way in which the purpose of the book is to be achieved, it is important to underline that each chapter will present one or more specific examples on how to perform an epidemiological or statistical data analysis. The single chapter will give the reader the possibility of conducting an epidemiological or a statistical analysis, using a step by step approach. In other words, the reader will be able to do the analysis following the detailed description of the commands to use and the figures that represent a picture of the software command and/or output.
Why do we believe the book is needed?
The answer is mainly of technical reason. Up to now, many books concerning epidemiology and biostatistics are available, but no one could give practical examples using different freely available software. This book will use software such as Epi Info, Episheet, Simcalc, StatCalc, RevMan, that are downloadable from the web, and could cover most arguments concerning the two disciplines. In selected cases, we will make examples using commercial statistical software, such as Stata and SPSS.
The reader will be interested in this book because he/she will find a resolution of an epidemiological/biostatistical problem with practical example and a guide to use the software in a very detailed and efficient way.
Have you ever been interested in performing an epidemiological data analysis, but you thought to be not able to?
Have you ever been in trouble in making a statistical analysis, because you considered statistics a matter of statisticians only?
Applied Epidemiology and Biostatistics is the answer to you.
Questions as following will find an answer:
How to perform a multiple logistic regression using your own data?How to calculate the 95% confidence intervals of that odds ratio?How to perform a meta-analysis of papers of your interest?How to make graphs for your report?How to make a ROC curve or control for a possible confounding?How to calculate the sample size needed for the clinical trial?This is a manual designed for using software, that are freely downloadable from the web, and could cover most arguments concerning the two disciplines, epidemiology and biostatistics. In selected cases, examples will use commercial statistical packages.
Who is the best reader of this book?
Considering that epidemiology can be seen as the study of factors affecting the health and illness of a certain population, and the Biostatistics is one of the main pillar of the research, in our intention this manual will have as principle possible targets the following:
Public Health practitioners (professionals, researchers).Clinicians (researchers).Health Managers (professionals, researchers).Teachers of Epidemiology.Teachers of Biostatistics.Finally, I would like to thank all the contributors to this manual. Without their support and suggestions, it would have been impossible to achieve this goal.
Now, do you want to start?
Let’s make Epidemiology and Biostatistics together!
Giuseppe La Torre
Instructions for Downloading
To download the software and databases described in this book, you need to:
access the website http://download.edizioniseed.itaccess the “Download area” andtype the code SOB0H46Y.Carlo Signorelli1, Edoardo Colzani2
1 University of Parma, Italy
2 University of Milano-Bicocca, Italy
Objectives of the chapter
To describe the measures of occurrence.To give some practical examples showing how to get them from a dataset using the statistical package SPSS.One of the objectives of epidemiology is to describe the frequency and distribution of diseases and other health-related events and to assess the association between possible risk factors and diseases. An initial contrast must be made between measures of occurrence (the ones describing a health phenomenon) and measures of association (the ones describing the strength of a possible association between an exposure and a health outcome). In this chapter, we will spend some time on the measures of occurrence. Measures of association will be discussed in a later chapter (see chapter 2).
The measures of occurrence can be divided into the following groups [1]:
Description of number of events.
Ratios.
Proportions.
Rates.
The description of the number of events usually only satisfies an administrative need for quantifying a phenomenon, but doesn’t usually give any information about the denominator to which it is referring. Knowing, for instance, how many people are sick with a certain disease could help institutions organize their healthcare facilities accordingly, but it would not give any additional information on how that disease is spread in that particular group of people in the absence of a denominator or a time reference.
A specific kind of proportion largely used in epidemiology is the prevalence. The prevalence can be defined as the proportion of cases of disease (or of another health-related phenomenon) at a certain time or period in the overall population.
Some refer to point prevalence when it is measured in a specific moment, and to period prevalence when it is measured over a defined period of time (a month, a year, etc.) [3]. The point prevalence can have big variations, especially if it measures diseases with a short duration, like infectious diseases that can wax and wane over relatively short periods of time. For instance, we could have a certain prevalence of influenza on one day and a very different one on the following day.
The period prevalence partially solves this problem since it is focused on a broader period of time and represents a good average of the phenomenon. In the denominator, it is usually considered the population at the mid-point of the time period. So, if we consider the prevalence of a certain disease over a year, the denominator will be the overall population on June 30 (had it been over a month, there would be the overall population on the 15th day of that same month, etc.), and the numerator will be the number of people with the disease in that year (both new and existing cases). The period prevalence always considers the entire population and this differentiates it from the incidence rate.
The prevalence has to be considered a static measure since it does not take time into account and also considers the existing cases of disease, not just the new cases. It is like a cross-sectional picture of a certain health-related phenomenon and it is mainly used to describe the fraction of a certain population affected by a disease or a risk factor. Its use is more frequently related to healthcare planning and the cost analysis of certain interventions [4].
The incidence is an epidemiologic measure that indicates the risk of developing a disease over a period of time; in other words, how many new cases of disease have occurred during a specific period of time in an at-risk population. It is a more dynamic measure compared to the prevalence, and can be calculated as a proportion (cumulative incidence, incidence proportion, or incidence risk) or, more frequently, as a rate (incidence rate, incidence density, or person-time incidence) [5].
The incidence proportion (or cumulative incidence or incidence risk) considers the number of new cases of disease in the numerator. The denominator is the population at risk for that disease at the beginning of the period of observation [6], so any individual counted in the denominator has, in theory, some chance of being counted in the numerator as well. Therefore, the denominator does not include people who already have the disease or people who surely cannot develop the disease—for instance, individuals fully immunised against a certain communicable disease.
The time reference of the incidence always has to be specified: if we had four new cases of measles in the last week in a group of 10 subjects, one of which had been immunised and another had already had the disease, we would say that the incidence proportion of influenza is four new cases divided by 8 (10 – 2) subjects at risk, per week, which is equal to 0.5 or 50%. The proportion of 50% could be also called the risk that a member of that group of people will develop influenza in a week: that is why it is also called incidence risk.
The incidence rate (or incidence density or person-time incidence) shares the exact same numerator (number of new cases) as the incidence proportion, but since it is a rate, it also includes time in the denominator so that person-times at risk as well as persons at risk are factored into the formula and it can be more accurate. It is particularly helpful when the event happens to the same person more than once during the study period and the investigators want to take this into account [7].
The advantage of considering the incidence rate instead of the incidence proportion is the higher accuracy of the former. In fact, if the group above were a dynamic cohort with people coming in and out over the period of time considered (one week), we could have easily taken into account each person’s respective contribution to the denominator by only considering the actual at-risk periods each person spent in the cohort without giving the same weight to people staying in the cohort at risk for just one day and to people who were at risk for the entire period.
Figure 1.1: Relationship between incidence and prevalence.
The incidence is a more dynamic measure because it takes time into account. The incidence proportion gives more of an estimate of the individual risk of getting a certain disease by not taking into account all the at-risk periods of time, and only using at-risk individuals in the denominator. The incidence rate instead can be seen as an estimate of the speed of a certain health-related phenomenon by taking time into account in the computational formula.
In a steady-state situation, in which the inflow of subjects in the population equals the outflow, and with a steady incidence over time, the following relationship applies:
Therefore, prevalence is affected both by changes in incidence and disease duration. In fact, if we notice an increase in prevalence of a certain disease, we can expect it to be due either to an increase in incidence (more new cases) or to an increase in disease duration (increased survival), or both (Figure 1.1).
For instance, if we knew that the incidence of pancreatic cancer was 10/100,000 per year, and that its prevalence was 25/100,000 in a certain year, then we could estimate the average duration of the disease by dividing the prevalence by the incidence [1]:
When investigators are dealing with large populations, the issue of the incidence denominator, which should just include the people (or person-times) actually at risk, is usually a minor one since only a small amount of people can be considered not at risk.
Sometimes, however, this can actually be an issue. For instance, when investigators are dealing with the incidence of uterine cancer, women who have had hysterectomies have to be excluded from the denominator (together with males, of course) to prevent an underestimate of the actual incidence or mortality rate [8].
The most important issue is to define who has the disease. In other words, we must determine who actually is “a case.” To do this, an accurate written definition of a “case” must be followed. There are diseases—like certain psychiatric conditions—that can follow different and more subjective diagnostic criteria. By using different diagnostic criteria, we can come up with different numerators and therefore a different incidence or prevalence. Biases in data collection in general can also potentially affect the measure of frequency obtained.
Figure 1.2: The simple dataset.
Figure 1.3: The Analyze menu.
Figure 1.4: Getting the results: the output window.
After opening the complete dataset in Microsoft Excel from SPSS (choose Open existing data source from the list shown and then Browse), the Analyze menu can be scrolled down to ask for the proportions for each field we want to consider in the analysis, as shown in Figure 1.3 (click Analyze then Descriptive statistics and then Descriptives).
The next step is to select Variables V3, V4 and V5 and move them into the variables box and then click OK.
Once the OK button has been clicked, the absolute and relative frequencies will be available from the outcome box, with each table reporting about each of the variables chosen (Figure 1.4). Since prevalence is a simple proportion, and in this specific case it is a point prevalence, the obtained proportion will be the proportion of people testing for HIV, or of people who are at high risk for HIV sexual transmission.
If the objective is instead to obtain incidence, a different dataset is needed because incidence cannot be computed with cross-sectional data for obvious reasons. For instance, a dataset about influenza cases in the northern Italian city of Parma can be used. The data were collected during the flu period by the week of surveillance and through an active surveillance system involving three general practitioners (GPs) for a total of 2,700 patients. Specimens were collected for each influenza-like illness (ILI) diagnosed by the GPs to search for influenza viruses and identify the different influenza strains [10].
Since the population is made up of patients (individuals) and not person-times, only the cumulative incidence can be calculated. The proportion of ILI cases (or viruses) detected by a week of surveillance can be computed, and this again can be done by opening the dataset from SPSS, as shown in Figure 1.5.
Figure 1.5: The dataset in SPSS.
In order to get the cumulative incidence of influenza-like illnesses (ILIs) by week of surveillance, the values in the ILI column should be divided by the values in the Denominator field. To do this with SPSS, scroll down the Transform menu and choose the Compute option. A window will then open (see Figure 1.6). The variables on the left will have to be selected and moved into the Numeric Expression field in the upper right-hand side. They are then divided by using the computational symbols or functions shown in the bottom fields. The name of the target variable (the new column where the incidence results should be displayed) will also have to be typed into the upper left-hand side corner of the window.
Figure 1.6: Computing the new variable.
The characteristics of the new column for the incidence, originally called Var, can be modified by clicking on the Variable View sheet from the tool bar on the page. This can either be carried out before or even after computing the incidence with the Transform command as discussed above. The variable names can be modified by double-clicking on them. It is also possible to change the characteristics of the variables, and in this case, the number of visible decimals, since the incidences are very small (see Figure 1.7).
Figure 1.7: Variables view.
Going back to the data view sheet, the new outcome column should now be visible, in this case renamed as ILI_incidence (see Figure 1.8) with all the incidences by week of surveillance computed. The same type of calculations can also be carried out to obtain the cumulative incidences of virus isolation in the same population by week.
Figure 1.8: The final data view with the ILI incidences.
Calculating incidence and prevalence by using these types of statistical packages can also be quite straightforward in obtaining grouped or stratified measures. In fact, by scrolling through the Transform and Analyse menus, it is quite easy to find many different options based on specific requests. As a bottom line, though, the importance of the quality of the collected data and their source cannot be stressed enough in getting accurate measures of occurrence. That is something that no statistical package can correct nor account for.
1. Signorelli C. Elementi di metodologia epidemiologica. 6th ed. Roma: Società Editrice Universo, 2005.
2. Rothman KJ, Greenland S. Modern Epidemiology. 2nd ed. Philadelphia: Lippincott-Raven Publishers, 1998.
3. Abramson JH. Making Sense of Data. 2nd ed. Oxford: Oxford University Press, 1994.
4. Beaglehole R, Bonita R, Kjellstrom T. Basic Epidemiology. 2nd ed. Geneva: WHO, 2007.
5. Last JM. A Dictionary of Epidemiology. 3rd ed. Oxford: Oxford University Press, 1995.
6. Lopalco PL, Tozzi AE. Epidemiologia facile. 1st ed. Rome: Il Pensiero Unico Editore, 2003.
7. Jekel JF, Elmore JG, and Katz DL. Epidemiology, Biostatistics and Preventive Medicine. 2nd ed. Philadelphia: Saunders Text and Review Series, 2004.
8. Gordis L. Epidemiology. 3rd ed. Philadelphia: Elsevier Saunders, 2004.
9. Signorelli C, Pasquarella C, Limina RM, Colzani E, Fanti M, Cielo A, et al. Third Italian national survey on knowledge, attitudes, and sexual behaviour in relation to HIV/AIDS risk and the role of health education campaigns. Eur J Public Health 2006; 16: 498-504.
10. Tanzi ML. Data from the Regional Center for the Surveillance of viral diseases. Parma, 2002.
Giuseppe La Torre1
1 Clinical Medicine and Public Health Unit, Sapienza University, Rome, Italy
Objectives of the chapter
To understand the concept of measures of association in different study designs.To be able to calculate relative risk, risk difference, and odds ratio with statistical packages.In epidemiology, the concept of relative risk (RR, sometimes called risk ratio) concerns a ratio of the probability (risk) of the event occurring in the exposed group versus the probability of the same event in a non-exposed group.
Exposure
Disease status
Present
Absent
Drinking
a
b
No drinking
c
d
Table 2.1: A 2 by 2 contingency table: cohort study.
We can consider the following contingency table, corresponding to a cohort study, in which one can see the exposure status (i.e., drinking alcohol) on the left entrance, and can categorise the disease status of the participants in the study (present vs. absent) on the upper entrance (Table 2.1).
From the Table 2.1, the risk of getting the disease for drinkers is a/(a + b). Moreover, the risk of getting the disease for nondrinkers is c/(c + d). The RR is defined as the ratio between the risk of getting the disease for drinkers and the risk of getting the disease for nondrinkers—i.e., [a/(a + b)] / [c/(c + d)]. In this case, the baseline risk of getting the disease comes from the not-exposed group, which can be seen as a reference.
In a randomized clinical trial (RCT), we can define the rate [a/(a + b)] as experimental event rate (EER), and the rate [c/(c + d)] as the control event rate (CER). If the exposure variable is not dichotomised, but three levels of exposure exist, one can still calculate the RRs, taking one level of exposure as the reference. Looking at Table 2.2, one can calculate the risk of getting the disease for each exposure category.
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!