Statistical Meta-Analysis with Applications - Joachim Hartung - E-Book

Statistical Meta-Analysis with Applications E-Book

Joachim Hartung

0,0
131,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

An accessible introduction to performing meta-analysis across various areas of research The practice of meta-analysis allows researchers to obtain findings from various studies and compile them to verify and form one overall conclusion. Statistical Meta-Analysis with Applications presents the necessary statistical methodologies that allow readers to tackle the four main stages of meta-analysis: problem formulation, data collection, data evaluation, and data analysis and interpretation. Combining the authors' expertise on the topic with a wealth of up-to-date information, this book successfully introduces the essential statistical practices for making thorough and accurate discoveries across a wide array of diverse fields, such as business, public health, biostatistics, and environmental studies. Two main types of statistical analysis serve as the foundation of the methods and techniques: combining tests of effect size and combining estimates of effect size. Additional topics covered include: * Meta-analysis regression procedures * Multiple-endpoint and multiple-treatment studies * The Bayesian approach to meta-analysis * Publication bias * Vote counting procedures * Methods for combining individual tests and combining individual estimates * Using meta-analysis to analyze binary and ordinal categorical data Numerous worked-out examples in each chapter provide the reader with a step-by-step understanding of the presented methods. All exercises can be computed using the R and SAS software packages, which are both available via the book's related Web site. Extensive references are also included, outlining additional sources for further study. Requiring only a working knowledge of statistics, Statistical Meta-Analysis with Applications is a valuable supplement for courses in biostatistics, business, public health, and social research at the upper-undergraduate and graduate levels. It is also an excellent reference for applied statisticians working in industry, academia, and government.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 409

Veröffentlichungsjahr: 2011

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



CONTENTS

Preface

1 Introduction

2 Various Measures of Effect Size

2.1 Effect Size Based on Means

2.2 Effect Size Based on Proportions

2.3 Effect Size Based on φ Coefficient and Odds Ratio

2.4 Effect Size Based on Correlation

3 Combining Independent Tests

3.1 Introduction

3.2 Description of Combined Tests

4 Methods of Combining Effect Sizes

5 Inference about a Common Mean of Several Univariate Normal Populations

5.1 Results on Common Mean Estimation

5.2 Asymptotic Comparison of Some Estimates of Common Mean for k = 2 Populations

5.3 Confidence Intervals for the Common Mean

5.4 Applications

6 Tests of Homogeneity in Meta-Analysis

6.1 Model and Test Statistics

6.2 An Exact Test of Homogeneity

6.3 Applications

7 One-Way Random Effects Model

7.1 Introduction

7.2 Homogeneous Error Variances

7.3 Heterogeneous Error Variances

8 Combining Controlled Trials with Normal Outcomes

8.1 Difference of Means

8.2 Standardized Difference of Means

8.3 Ratio of Means

9 Combining Controlled Trials with Discrete Outcomes

9.1 Binary Data

9.2 Ordinal Data

10 Meta-Regression

10.1 Model with One Covariate

10.2 Model with More Than One Covariate

10.3 Further Extensions and Applications

11 Multivariate Meta-Analysis

11.1 Combining Multiple Dependent Variables from a Single Study

11.2 Modeling Multivariate Effect Sizes

12 Bayesian Meta-Analysis

12.1 A General Bayesian Model for Meta-Analysis under Normality

12.2 Further Examples of Bayesian Analyses

12.3 A Unified Bayesian Approach to Meta-Analysis

12.4 Further Results on Bayesian Meta-Analysis

13 Publication Bias

14 Recovery of Interblock Information

14.1 Notation and Test Statistics

14.2 BIBD with Fixed Treatment Effects

15 Combination of Polls

15.1 Formulation of the Problem

15.2 Meta-Analysis of Polls

16 Vote Counting Procedures

17 Computational Aspects

17.1 Extracting Summary Statistics

17.2 Combining Tests

17.3 GeneralizedP-values

17.4 Combining Effect Sizes

18 Data Sets

18.1 Validity Studies

18.2 Effects of Teacher Expectance on Pupil IQ

18.3 Dentifrice Data

18.4 Effectiveness of Amlodipine on Work Capacity

18.5 Effectiveness of Cisapride on the Treatment of Nonulcer Dyspepsia

18.6 Second-hand Smoking

18.7 Effectiveness of Misoprostol in Preventing Gastrointestinal Damage

18.8 Prevention of Tuberculosis

References

Index

WILEY SERIES IN PROBABILITY AND STATISTICS

Established by WALTER A. SHEWHART and SAMUEL S. WILKS

Editors: David J. Balding, Noel A. C. Cressie, Garrett M. Fitzmaurice, Iain M. Johnstone, Geert Molenberghs, David W. Scott, Adrian E M. Smith, Ruey S. Tsay, Sanford Weisberg

Editors Emeriti: Vic Barnett, J. Stuart Hunter, Jozef L. Teugels

A complete list of the titles in this series appears at the end of this volume.

Copyright © 2008 by John Wiley & Sons, Inc. All rights reserved.

Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic format. For information about Wiley products, visit our web site at www.wiley.com.

Library of Congress Cataloging-in-Publication Data:

Hartung, Joachim, Prof. Dr.Statistical meta-analysis with applications / Joachim Hartung, Guido Knapp, Bimal K. Sinha.p. cm.

Includes bibliographical references and index.ISBN 978-0-470-29089-7 (cloth)

1. Statistical hypothesis testing. 2. Meta-analysis. I. Knapp, Guido. II. Sinha, Bimal K., 1946–III. Title.

QA277.H373 2008

519.5’6—dc222008009435

To my wife Bärbel and my children Carola, Lisa, Jan, and Jörn

To my parents Magdalena and Wilhelm

In memory of Professor Shailes Bhusan Chaudhuri for his excellent academic training in statistics and parental love during my Ashutosh College years

PREFACE

Statistical Meta-Analysis with Applications combines our experiences on the topic and brings out a wealth of new information relevant for meta-analysis. Meta-analysis, a term coined by Glass (1976), and also known under different names such as research synthesis, research integration, and pooling of evidence, deals withthe statistical analysis of a large collection of analysis results from individual studies for the purpose of integrating the findings.

It is a common phenomenon that many studies are carried out over time and space on some important global issues with a common target or goal. As an example, we can cite the 19 studies carried out in the context of effects of second-hand smoking on women! Sometimes the studies may correspond to different experiment settings with one objective in mind. The main reason that many studies on a research topic are carried out rather than a single study is to strengthen the overall conclusion about a certain hypothesis or to negate it with a stronger conviction. When the results of these component studies, either in full or in summary form, are available, it is desirable that we combine the results of these studies in a meaningful way so as to arrive at a valid conclusion about the target parameter. The main object of statistical meta-analysis is precisely to provide methods to meaningfully combine the results from component studies.

There are many aspects of statistical meta-analysis which must be addressed in a book. Most of the concern arises from the nature of the underlying studies, the nature of information available from these studies, and also the nature of assumptions about the distributions of random variables arising in the studies. We have provided a complete treatment of all these aspects in this book.

Several new features of this book are worth mentioning. We have indicated a wide variety of applications of statistical meta-analysis ranging from business to education to environment to health sciences in both univariate and multivariate cases. Our treatment of the statistical meta-analysis about (1) the common mean of several univariate normal populations, (2) tests of homogeneity, (3) one-way random effects model, (4) categorical data, (5) recovery of interblock information, and (6) combination of polls is entirely new, based on many recent results by us and others on these topics. Other topics such as meta-regression, multivariate meta-analysis, and Bayesian metaanalysis also appear in completely new forms in our book. Another special feature of the book is the incorporation of a detailed discussion about computational aspects and related softwares to carry out statistical meta-analysis in practice. Readers will find many extra useful features in this book compared to the existing books on this subject. Our book complements the statistical methods and results described in an excellent Academic Press textStatistical Methods for Meta-Analysis by Hedges and Olkin (1985). We put it on record our indebtedness to this book and also to the excellent edited volumeThe Handbook of Research Synthesis by Cooper and Hedges (1994) for many ideas on statistical meta-analysis. We have freely used some of the data sets and basic ideas from these two sources, and indirectly we owe a lot to Professors Harris Cooper, Larry Hedges, and Ingram Olkin!

Although some topics and chapters covered in this book require the knowledge of advanced statistical theory and methods, most of the meta-analysis methods described in the book can be understood and applied with asolid master’s level background in statistics. Parts of the book can also be used as a graduate text on this topic. We believe that practitioners of statistical meta-analysis will benefit a lot from this book owing to a host of worked-out examples from various contexts. The example data sets and the program code may be downloaded from G.K.’s website at http: //www. statistik.uni-dortmund. de/~knapp. Given that the possible application areas of meta-analysis are fairly broad, we have limited ourselves to a selected few applications depending on our own interest and expertise.

Financial support from the Dortmund University of Technology, Dortmund, Germany, and University of Maryland, Baltimore County, Maryland, are thankfully acknowledged. We are also grateful to Professors Leon Glaser, Satish Iyenger, and Neil Timm from the University of Pittsburgh for providing us with reprints of their papers on many aspects of multivariate meta-analysis. We are thankful to Professor Anirban Dasgupta of Purdue University for giving us his kind permission to include his work oncombination of polls in this book. This certainly adds a new dimension! This book grew out of many lectures delivered on some of the topics of statistical meta-analysis at the University of Hong Kong (B.K.S.), Tunghai University (B.K.S.) (Taichung, Taiwan), University of South Australia (B.K.S), University of Tampere, Finland (G.K. and B.K.S.), University of Turku, Finland (G.K. and B.K.S.), United States Environmental Protection Agency (G.K. and B.K.S.) and the U.S. National Center for Health Statistics (G.K. and B.K.S.), and, of course, at our host institutions (B.K.S. at the University of Maryland, Baltimore County, J.H. and G.K. at Dortmund University of Technology).

We mention with great pleasure the invitations received from all these places and the many comments we received from the audience, including our own students, which helped us to improve the contents and the presentations. We very much appreciate the excellent academic atmosphere at Dortmund University of Technology and University of Maryland, Baltimore County, where most of the book was written.

Last but not least, we express our sincere thanks to our understanding family members who occasionally had to put up with our changing moods due to the tremendous pressure in writing this book with as much information and accuracy as possible.

JOACHIM HARTUNGGUIDO KNAPPBIMAL K. SINHA

Dortmund, GermanyBaltimore, MarylandJune 2008

CHAPTER 1INTRODUCTION

Meta-analysis, a term coined by Glass (1976), is intended to provide the statistical analysis of a large collection of analysis results from individual studies for the purpose of integrating the findings.

Meta-analysis, or research synthesis, or research integration is precisely a scientific method to accomplish this goal by applying sound statistical procedures, and indeed it has a long and old history. The very invention of least squares by Legendre (1805) and Gauss (1809) is an attempt to solve just a unique problem of meta-analysis: use of astronomical observations collected at several observatories to estimate the orbit of comets and to determine meridian arcs in geodesy (Stigler, 1986). In order to determine the relationship between mortality and inoculation with a vaccine for enteric fever, Pearson (1904) used data from five small independent samples and computed a pooled estimate of correlation between mortality and inoculation in order to evaluate the efficacy of the vaccine. As an early application of meta-analysis in the physical sciences, Birge (1932) combined estimates across experiments at different laboratories to establish reference values for some fundamental constants in physics. Early works of Cochran (1937), Yates and Cochran (1938), Tippett (1931), and Fisher (1932) dealt with combining information across experiments in the agricultural sciences in order to derive estimates of treatment effects and test their significance. Likewise, there are plenty of applications of meta-analysis in the fields of education, medicine, and social sciences, some of which are briefly described below.

In the field of education, meta-analysis is useful in combining studies about coaching effectiveness to improve Scholastic Aptitude Test (SAT) scores in verbal and math (Rubin, 1981; DerSimonian and Laird, 1983), in studying the effect of open education on (i) attitude of students toward school and (ii) student independence and self-reliance, and in combining studies about the relationship between teacher indirectness and student achievement (Hedges and Olkin, 1985). In social science, there is a need to combine several studies of gender differences in separate categories of quantitative ability, verbal ability, and visual-spatial ability (Hedges and Olkin, 1985). For some novel applications of meta-analysis in the field of medicine, we refer to Pauler and Wakefield (2000) for three applications involving dentrifice data, antihypertension data, and preeclampsia data, to Berry (2000) for questions about benefits arid risks of mammography of women based on six studies, to Brophy and Joseph (2000) for meta-analysis involving three studies to compare streptokinase and tissue-plasminogen activator to reduce mortality following an acute myocardial infarction, and lastly to Dominici and Parmigiani (2000) for an application of metaanalysis involving studies in which outcomes are reported on continuous variables for some medical outcomes in some studies and on binary variables on similar medical outcomes in some other studies. Of course, there are numerous other diverse applications of meta-analysis in many other fields. We mention several applications below.

A: Business applications. In the context of business management and administration, one often encounters several studies with a common effect, and the problem then is of drawing suitable inference about the common effect based on the information from all the studies. Here are some examples. In the context of studying price elasticity, Tellis (1988) reports results from 42 studies! Sethuraman (1995) performed meta-analysis of national brand and store brand cross-promotional price elasticities. Lodish et al. (1995) reported results of 389 real world split cable TV advertising experiments: How TV advertising works? Churchill et al. (1985) reported meta-analysis of the determinants of salesperson performance. Farley and Lehmann (1986, 2001) and Farley, Lehmann, and Sawyer (1995) emphasized the important role of meta-analysis in international research in marketing. A current major thrust in marketing has been an attempt to create global products and brands while retaining local requirements: think global, act local. Deciding which elements of which products can be produced globally and which locally requires meta-analysis of each of the elements.

B: Environmental applications. In the context of environmental problems, there are several situations where the meta-analysis methods can be successfully applied. Here is a partial list of such applications.

Evaluation of superfund cleanup technologies (Sinha, O’Brien, and Smith, 1991; Sinha and Sinha, 1995). Cleaning up of superfund waste sites (nuclear/chemical/biological) at the National Priorities List (NPL), based on an index comprising four measures, air, groundwater, soil, and surface water, often requires innova- tive/extremely expensive technology. A critical study of performance of the suggested technologies after a certain amount of time is highly desirable. If found useful, the technologies can be encouraged to continue at the same site. If otherwise, this should be determined as soon as possible so that suitable corrective measures can be taken. Towards this end, a common procedure is to study a preremediation baseline sample and an interim sample taken after a certain period of operation of the technology and test if a desirable percentage of the total contaminant has been removed. Comparison of a few such technologies can be based on several studies, and a data synthesis or pooling of evidence is very natural here in order to determine the final ranking of the technologies.

Assessment of gasoline quality (Yu, Sun, and Sinha, 2002). The U.S. Environmental Protection Agency (EPA) evaluates/regulates gasoline quality based on what is known as Reid vapor pressure (RVP). Samples of gasoline are taken from various pumps and RVPs are measured in two ways: on-site at the field level (cheap and quick) and also off-site at the laboratory level (expensive/higher precision). This usually results in two types of data: field data and lab data. Gasoline quality based on RVP can then be determined combining the evidence in both the data sets—a clear application of meta-analysis!

Water quality in Hillsdale Lake (Li, Nussbaum, and Sinha, 2000). Hillsdale Lake, a large federal reservoir located about 30 miles from the Kansas City metropolitan area, was authorized by the U.S. Congress in 1954 as part of a comprehensive flood control plan for the Osage and Missouri River Basins. The lake is a major recreational resource— over 500,000 visitors annually—and is also a significant source of drinking water. It is therefore essential that the water quality in this lake, as measured by Secchi depth, be regulated regularly. To achieve this, typically data from a survey of lake users in the various categories of swimming, fishing, boating, skiing, and water sports can be collected and analyzed in order to establish what level of water clarity users perceive as good. Again, it is quite possible that several studies are conducted for this purpose, and there is a need to pool the evidence from such studies to arrive at an overall conclusion about the water clarity level.

A comparison of CMW and DPW for groundwater monitoring (Li, 2000). Long-term monitoring of contamination of groundwater at former military land sites is performed by boring wells into the ground at predetermined locations and then assessing trace amounts of certain chemicals. There are two well-known methods for this purpose: an expensive traditional method of conventionally monitored wells (CMWs) and a relatively cheaper new methodology of direct push wells (DPWs). In order to compare these two methods, a joint study was conducted by the United States Air Force with the EPA to evaluate the assessment of pollutants. The former Hanscom Air Force Base (HAFB) located in Middlesex County, Massachusetts, and straddling the towns of Bedford, Concord, Lexington, and Lincoln was selected as the study site, and groundwater samples were collected for an assessment of long-term monitoring with both CMWs and DPWs based on 31 paired well locations. Data were collected on nine volatile organic carbons (VOCs): vinyl chloride, 1,1 -DCA, benzene, toluene, o-xylene, trans- 1,2-DCE, TCE, and 1,4-DCA—labeled as VOC1, VOC2, VOC3, VOC4, VOC5, VOC6, VOC7, VOC8, and VOC9. The site was divided into three regions, and data were collected separately in each region. It is then in the spirit of meta-analysis that we combine the results of the three regions and decide if the two methods perform equally or there is a significant difference.

Effect of second-hand smoking on women. This of course is a vital environmental issue with a potential for adverse health effects. Several studies were conducted in many parts of the world to determine if second-hand smoking is harmful for women, and it is absolutely essential that we carry out a meta-analysis, pooling the evidence from all the studies, in order to find out the underlying state of the matter. The relevant data set is reported in Section 18.6. We should mention that based on a suitable metaanalysis of the collected information, an advisory committee of the EPA designated environmental tobacco smoke as a carcinogen.

C: Health sciences applications. In the context of medicine or health science problems, there are several situations where the meta-analysis methods can be successfully applied. Here is a partial list of such applications.

Antiplatelet drug for patients with ischemic attacks. In 1988, the question of whether to prescribe an antiplatelet drug for patients with transient ischemic attacks to prevent stroke was controversial. At that time, many randomized trials of antiplatelet drugs to treat patients with cerebrovascular disease have been completed, but the studies were variable in question and their results were contradictory. A meta-analysis of these studies by the Antiplatelet Trialist’ Collaboration (1988) found a highly significant 22% reduction in the estimated relative risk of stroke, myocardial infarction, and vascular death in patients with cerebrovascular disease who were treated with an antiplatelet drug.

Functional dyspepsia (Allescher et al., 2001). Nonulcer dyspepsia is characterized by a variety of upper abdominal symptoms in the absence of organic disease. Within the general population, dyspepsia is very common, and as a result, empirical therapy without prior diagnostic procedures has been recommended for the management of these patients. Both acid-suppressive substances such as histamine H2-receptor antagonists (H2-RAS) and gastroprokinetics have been suggested as first-line, empirical therapy. Clinical trials of H2-RAS have yielded somewhat contradictory results and benefit seems largely confined to refluxlike or ulcerlike dyspepsia subgroups. All these findings came out of an appropriate meta-analysis study.

Dentifrice (Johnson, 1993). In a series of nine randomized controlled clinical trials, sodium monofluorophosphate (SMFP) was compared to sodium fluoride (NaF) dentifrice in the prevention of caries development. The data consist of treatment differences, NaFi – SMFPi, where NAFi is the change from baseline in the decayed/missing (due to caries)/filled-surface dental index at three years follow-up for regular use of NaF and SMFPi is defined similarly for i = 1,…, 9. A statistical meta-analysis is in order here.

Recovery time after aneasthesia (Whitehead, 2002). A multicenter study with nine centers was undertaken to compare two anaesthetic agents undergoing short surgical procedures, where rapid recovery is important. The response of interest is the recovery time [time from when the anaesthetic gases are turned off until the patients open their eyes (in minutes)]. A meta-analysis would be quite appropriate in this context. Incidentally, a logarithmic transformation of the underlying data produces almost normal data.

As the scope of meta-analysis grew over the years, several terminologies also came into existence, such as quantitative research synthesis, pooling of evidence, or creating an overview. While most of the early works, including Mosteller and Bush (1954), provided a logical foundation for meta-analysis, the appearance of several books, notably Glass, McGaw, and Smith (1981), Hunter, Schmidt, and Jackson (1982), Rosenthal (1984), Hedges and Olkin (1985), and the edited volume by Cooper and Hedges (1994), and literally thousands of meta-analytic papers during the last 20 years or so, primarily covering applications in health sciences and education, has made the subject to have a very special role in diverse fields of applications.

The essential character of meta-analysis is that it is the statistical analysis of the summary findings of many empirical studies, which are called primary analyses, all targeted towards a common goal. However, differences among the constituent studies due to sampling designs, presence of different covariates, and so on, can and do exist while sharing a common objective. A fundamental assumption behind conducting a meta-analysis or pooling of evidence or information or data across studies in order to obtain an average effect across all studies is that the size of the effect (basic parameter of interest) reported in each study is an estimate of a common effect size of the whole population of studies. It is therefore essential to test for homogeneity of population effect sizes across studies before conducting a meta-analysis if obtaining an estimate of average effect or its test is the primary goal of the meta-analysis.

The notion of effect size is central to many meta-analysis studies which often deal with comparing two treatments, control and experimental, in an effort to find out if there is a significant difference between the two. In the case of continuous measurements, a standardized mean difference plays an important role to measure such a difference. In the case of qualitative attributes, the difference or ratio of two proportions, odds ratio, and φ coefficient are used to capture such differences. Again, when the objective is to study the relationship between two variables, an obvious choice is the usual correlation coefficient.

Recent meta-analytic work, however, concentrates on discovering and explaining variations in effect sizes rather than assuming that they remain the same across studies, which is perhaps rarely the case owing to uncontrollable differences in study contexts, designs, treatments, and subjects. When results of several scientific studies of the same phenomena exist and more or less agree, by conducting an appropriate test of homogeneity and accepting the hypothesis of homogeneity, the case for summarizing results of all studies with a single average effect size can be strengthened and defended. If, however, this hypothesis is rejected, no single number can adequately account for the variety of reported results. Thus, if the results from various studies differ either significantly or even marginally, we should make an attempt to investigate methods to account for the variability by further work. This is precisely the spirit of some recent research in meta-analysis using random and mixed effects models, allowing inclusion of trial-specific covariates which may explain a part of the observed heterogeneity. In other words, a set of conflicting findings from different studies is looked upon as an opportunity for learning and discovering the sources of variation among the reported outcomes rather than a cause for dismay.

While most common meta-analysis applications involve comparison of just one variable (experimental) with another (control), multivariate data can also arise in meta-analysis due to several reasons. First, the primary studies themselves can be multivariate in nature because these studies may measure multiple outcomes for each subject and are typically known as multiple-endpoint studies. It should, however, be noted that not all studies in a review would have the same set of outcomes. For example, studies of SAT do not all report math and verbal scores. In fact, only about half of the studies dealt with in Becker (1990) provided coaching results for both math and verbal! Secondly, multivariate data may arise when primary studies involve several comparisons among groups based on a single outcome. As an example, Ryan, Blakeslee, and Furst (1986) studied the effects of practice on motor skill levels on the basis of a five-group design, four different kinds of practice groups and one no- practice group, thus leading to comparisons of multivariate data. These kinds of studies are usually known as multiple-treatment studies.

As mentioned earlier, although most statistical methods of meta-analysis focus on deriving and studying properties of a common estimated effect which is supposed to exist across all studies, when heterogeneity across studies is believed to exist, a meta-analyst must estimate the extent and sources of heterogeneity among studies if the hypothesis of homogeneity is not found to be tenable. While fixed effects models discussed in this book under the assumption of homogeneous effects sizes continue to be the most common method of meta-analysis, the assumption of homogeneity given variability among studies due to varying research and evaluation protocols may be unrealistic. In such cases, a random effects model which avoids the homogeneity assumption and models effects as random and coming from a distribution is recommended. The various study effects are believed to arise from a population, and random effects models borrow strength across studies in providing estimates of both study-specific effects and underlying population effect.

Whether a fixed effects model or a random effects model, a Bayesian approach considers all parameters (population effect sizes for fixed effects models, in particular) as random and coming from a superpopulation with its own parameters. There are several advantages for a Bayesian approach to meta-analysis. The Bayesian paradigm provides in a very natural way a method for data synthesis from all studies by incorporating model and parameter uncertainty. Moreover, a predictive distribution for future observations coming from any study, which may be a quantity of central interest to some decision makers, can be easily developed based on what have been already observed. The use of Bayesian hierarchical models often leads to more appropriate estimates of parameters compared to the asymptotic ones arising from maximum likelihood, especially in the case of small sample sizes of component studies, which is typical in meta-analysis.

There are at least two other vital issues with meta-analysis procedures. Although it is true that most of the primary studies to be included in a meta-analysis provide a complete background of the problem being considered along with relevant entire or summary data, it also happens sometimes that some studies report only the ultimate finding in terms of the sign of the estimated underlying effect size being positive or negative or in terms of the significance or nonsignificance of the test for the absence of an effect size. It then poses a challenge for the statisticians to develop suitable statistical procedures to take into account this kind of incomplete or scanty information to carry out meta-analysis. Fortunately, there are techniques under the category of vote counting procedures to effectively deal with such situations.

The problem of selection or publication bias is rather crucial in the context of meta-analysis since the reported studies on which meta-analysis is typically based tend to be mostly significant and there could be many potential nonsignificant studies which are not reported at all simply because of their nonsignificant findings and hence these studies are not amenable to meta-analysis considerations. Such a situation is bound to happen in almost any meta-analysis scenario in spite of one’s best attempt to get hold of all relevant studies, and statistically valid corrective measures should be developed and followed to deal with such a serious publication bias issue. Again, fortunately, there are some valid statistical procedures to tackle this vital problem.

We now point out that in the context of statistical meta-analysis, there are four important stages of research synthesis:

(i) problem formulation stage

(ii) data collection stage

(iii) data evaluation stage

(iv) data analysis and interpretation stage.

We describe these four stages below.

At the formulation stage of the research synthesis problem, we clearly spell out the universe to which generalizations are made (fixed effects model and random effects model) and the nature of the effect size parameters to be inferred upon (Hedges, 1994). Since research synthesis extends our knowledge through the combination and comparison of primary studies, it is important for us to indicate the perspective of the fixed effects model where the universe to which generalizations are made consists of ensembles of studies identical to those in the study sample. On the other hand, the random effects model perspective is relevant when the universe to which generalizations are made consists of a population of studies from which the study sample is drawn. Objectively and clearly defining the nature of the effect size parameter to be estimated or tested in a meta-analysis problem is also fundamental. One instance about the inference of an effect size is to ascertain the relationship between two variables X and Y in terms of either (a) estimation of the magnitude of the relationship (effect size) along with an indication of the accuracy or the reliability of the estimated effect size (standard error or confidence interval) or (b) a test of significance of the difference between the realized effect size and the effect size expected under the null hypothesis of no relation between X and Y. Some other common effect size measures are given by the standardized difference of two means, standardized difference of two proportions, difference of two correlations, ratio of proportions, odds ratio, risk ratio, and so on. We have elaborated on all these measures in Chapter 2.

The data collection or literature search stage in research synthesis is indeed very challenging. This is of course different from primary analysis of studies. There are usually five major modes of searching for sources of primary research, namely, manual and computer search of subject indexes from abstract databases, footnote chasing (references in review/nonreview papers and books), consultation (formal/informal requests, conferences), browsing through library shelves, and manual and computer citation searches (White, 1994). While it is hoped that these search procedures, in addition to reviewing books/book chapters, research/technical reports, conference papers, and other possible sources, would lead to an exhaustive collection of relevant literature for the problem under study, sometimes we also need to use special ways and means to retrieve what are known as fugitive literature and information appearing in unpublished papers/technical reports, unpublished dissertations/master’s theses, and the like. In this context, publication bias is quite relevant while doing the research synthesis, bearing in mind the fact that often research leading to nonsignificant conclusions are not reported at all or rarely so (the well known file-drawer problem). We have addressed this important issue in Chapter 13.

Not all studies available for meta-analysis may qualify for inclusion due to various reasons. The data evaluation stage consists of carefully checking the nature and sources of primary research data, missing observations in primary data, and sources of potential bias in the primary data, all in an attempt to assign suitable weights to the various primary data sources at the time of carrying out meta-analysis or data synthesis.

Finally, the data analysis stage, which is the main purpose of this book, deals with statistically describing and combining various primary studies. Naturally what we need here is a wide collection of sound statistical methods depending on the nature of the underlying problem. We describe ways to combine various measures of effect sizes either for estimation or test or confidence interval and also ways to deal with missing values in primary studies as well as publication bias. In the sequel, we need statistical procedures for univariate and multivariate cases, discrete and continuous cases, and also frequentist and Bayesian methods.

Given the above broad spectrum of topics that can be covered under the umbrella of a book on meta-analysis, our goal in writing this book is primarily concerned with some statistical aspects of meta-analysis. As already mentioned, the heart of the enterprise of carrying out meta-analysis or synthesizing research consists of comparing and combining the results of individual primary studies of a particular, focused research question, and the emphasis is essentially on two types of statistical analysis: combining results of tests of significance of effect size and combining estimates of effect size. The effect size, as explained earlier, is a generic term referring to the magnitude of an effect or more generally the size of the relation between two variables. Moreover, in case of diverse research findings from comparable studies, an attempt must be made to understand and point out reasons for such differences.

Keeping the above general points in mind, the outline of the book is as follows.

Chapter 2 describes various standard measures of effect size based on means, proportions, φ coefficient, odds ratio, and correlations. Some illustrative examples to explain the related computations and concepts are included.

Chapter 3 deals with methods of combining individual tests based on primary research with plenty of applications. This chapter is exclusively based on combination of P-values mainly because the studies which are meant for meta-analysis more often report their P-values than other details of the study. The methods described here are exact and also appear in standard textbooks on meta-analysis. It should be mentioned that there are other methods based on suitably combining often independent component test statistics. However, the sampling distributions of the combination of such test statistics may not be readily available. We discuss these aspects in detail in Chapter 5 in the context of inference about a common mean of several univariate normal populations.

Chapter 4 describes methods of combining individual estimates of effect sizes based on primary research to efficiently estimate the common effect size parameter as well as to construct its confidence interval. The methods suggested in this chapter are mainly asymptotic in nature and again are quite routine.

Chapter 5 is devoted to a detailed analysis of a special kind of meta-analysis problem, namely, inference about the common mean of several univariate normal populations with unknown and unequal variances. This problem has a long and rich history and is very significant in applications. Most of the results presented here are new and have not appeared in any textbook before. Two classic data sets are used throughout to explain the concepts.

Chapter 6 describes various tests of the important hypothesis of the homogeneity of population effect sizes in some particular models. In the context of statistical meta-analysis, one should carry out these tests of homogeneity of effect sizes before applying tools of combining the effect sizes. The results presented here are based on the review papers by Hartung and his students.

One-way random effects models, useful when the basic hypothesis of homogeneity of effect sizes does not hold, is taken up in Chapter 7. There is a huge literature on this topic and we have made an attempt to present all the important results in this connection. Typically, there are two scenarios: error variances are all equal (homogeneous case) and error variances are not equal (heterogeneous case). We have dealt with both cases. The reader will find a variety of new and novel solutions in this chapter.

Chapter 8 extends the results of the previous three chapters to the meta-analysis of comparative trials with normal outcome. Results in the fixed effects as well as in the random effects model are provided for the effect sizes difference of means, standardized difference of means, and ratio of means.

Meta-analysis procedures to analyze categorical data of both binary and ordinal nature are presented in Chapter 9. We have provided fixed effects as well as random effects results with motivating examples. This is another nice feature of the book.

Meta-regression, multivariate meta-analysis, and Bayesian meta-analysis are, respectively, presented in three subsequent chapters—10, 11, and 12. In Chapter 10, we describe meta-analysis regression procedures with one and more than one covari- ate with illustrations. In Chapter 11, we describe both aspects of multiple-endpoint and multiple-treatment studies. We have provided a unified Bayesian approach to meta-analysis with some examples in Chapter 12. All these three chapters provide unique features to our book.

The important concepts of publication bias and vote counting procedures, which also appear in standard textbooks, are taken up in Chapters 13 and 16. As mentioned earlier, these problems arise when we do not have access to all the literature on the subject under study and also when there is not enough evidence in the studies which are indeed available.

We describe in Chapter 14 the statistical methods for recovery of interblock information. One of the earliest applications of statistical meta-analysis (Yates, 1939, 1940; Rao, 1947) consists of combining what are known as intrablock and interblock estimates of treatment effects in the presence of random block effects in two-way mixed effects models. Early tests of significance of treatment effects in this context were based on combining the MP-values using Fisher’s method. Since this method does not take into account the underlying statistical structure, methods to improve it were suggested in several papers (Feingold, 1985, 1988; Cohen and Sackrowitz, 1989; Mathew, Sinha, and Zhou, 1993; Zhou and Mathew; 1993). We describe all these procedures in detail in this chapter, which is a new and novel contribution to the vast literature on statistical meta-analysis.

A different kind of meta-analysis dealing with a combination of polls is presented in Chapter 15. This particular topic has applications in market research and is completely new. This is based on a technical report by Dasgupta and Sinha (2006).

There are many computational aspects of statistical meta-analysis which are taken up in Chapter 17 using the general statistical software packages SAS and R. Sample programs for both softwares are explained with examples.

Finally, sample data sets which are analyzed throughout the book are included in the final section of the book, Chapter 18. The References section at the end contains a long list of papers referred to in this book.

We conclude this introductory chapter with two observations. First, although we have described a variety of diverse scenarios where meta-analysis methods can be successfully applied, we have not made an attempt to do so. Our illustrations of the methods are naturally limited to our judgment and own experiences. Second, except in some chapters, virtually all of the statistical methods described in this book are based on standard large sample results for the (asymptotic) distributions of sample means, sample proportions, sample correlations, and so on, and hence due caution should be exercised when using these methods. Some of the frequently quoted results are listed below for ready reference (see Rao, 1973; Rohatgi, 1976):

1. X1,…, Xn are independently and identically distributed (iid) with mean μ and variance σ2. Then, for large n,

that is,

This is a standard version of the celebrated central limit theorem (CLT).

2. X1,…, Xn are iid with mean μ and variance σ2. Then, for large n,

where S2 = Σni=1 (Xi™ )2/(n−1). This is an application of CLT coupled with Cramer’s theorem (Slutsky’s theorem).

3. X ~ B(n, P). Then, for large n,

where Q = 1 − P, that is,

where p = X/n. This is a standard application of the CLT.

4. X ~ B(n, P). Then, for large n, writing p = X/n,

This is a well-known version of Fisher’s variance-stabilizing transformation applied to the binomial proportion.

5. (X1, Y1),…, (Xn, Yn) are iid from a bivariate distribution with means (μ1, μ2), variances (σ21, σ22), and correlation ρ. Then, for large n,

where r is the usual sample correlation defined as

This is also an application of CLT coupled with Cramer’s theorem (Slutsky’s theorem; see Rao, 1973).

6. (X1, Y1),…, (Xn, Yn) are iid from a bivariate distribution with means (μ1, μ2), variances (σ21, σ22), and correlation ρ. Then, for large n,

where

This is a well-known version of Fisher’s variance-stabilizing transformation applied to the sample correlation coefficient.

CHAPTER 2VARIOUS MEASURES OF EFFECT SIZE

Quite often the main objective in a study is to compare two treatments: experimental and control. When these treatments are applied to a set of experimental units, the outcomes can be of two types, qualitative and quantitative, leading to either proportions or means. Accordingly, effect sizes are also essentially of these two types: those based on differences of two means and those based on differences of two proportions. A third type of effect size, namely, correlation, arises when the objective in a study is to ascertain the nature and extent of the relationship between two variables.

2.1 EFFECT SIZE BASED ON MEANS

An effect size based on means is defined as follows. Denote the population means of the two groups (experimental and control) by μ1 and μ2 and their variances by σ21 and σ22, respectively. Then the effect size θ based on means is a standardized difference between μ1 and μ2 and can be expressed as

(2.1) 

where σ denotes either the standard deviation σ2 of the population control group or an average population standard deviation (namely, an average of σ1 and σ2).

The above measure of effect size θ can be easily estimated based on sample values, and this is explained below. Suppose we have a random sample of size n1 from the first population with the sample mean 1 and sample variance S21 and also a random sample of size n2 from the second population with the sample mean 2 and sample variance S22. One measure of the effect size θ, known as Cohen’s d (Cohen, 1969, 1977, 1988), is then given by

(2.2) 

where the standardized quantity S is the pooled sample standard deviation defined as where

with

A second measure of θ, known as Hedges’s g (Hedges, 1981, 1982), is defined as

(2.3) 

where the standardized quantity S* is also the pooled sample standard deviation defined as with

It can be shown that (see Hedges and Olkin, 1985)

(2.4) 

(2.5) 

where

In case the population variances are identical in both groups, under the assumption of normality of the data, Hedges (1981) shows that g follows a noncentral t distribution with noncentrality parameter θ and n1 + n2 − 2 degrees of freedom. Consequently, the exact mean and variance of Hedges’s g are given by

(2.6) 

(2.7) 

and Γ(·) denotes the gamma function. As Cohen’s d is proportional to Hedges’s g, the results in Eq. (2.6) can be easily transferred providing the mean and variance of Cohen’s d. The exact mean in Eq. (2.6) is well approximated by Eq. (2.4) so that an approximately unbiased standardized mean difference g* is given as

(2.8) 

Finally, a third measure of θ known as Glass’s Δ (Glass, McGaw, and Smith, 1981), is defined as

(2.9) 

where the standardized quantity is just S2, the sample standard deviation based on the control group alone. This is typically justified on the ground that the control group is in existence for a longer period than the experimental group and is likely to provide a more stable estimate of the common variance. Again under the assumption of normality of the data, Hedges (1981) shows that Δ follows a noncentral t distribution with noncentrality parameter θ and n2 − 1 degrees of freedom.

The variances of the above estimates of θ, in large samples, are given by the following:

The estimated variances are then obtained by replacing θ in the above expressions by the respective estimates of θ, namely, d, g, and Δ. These are given below:

Large sample tests for H0: θ = 0 versus H1: θ ≠ 0 are typically based on the standardized normal statistics

(2.10) 

where is an estimate of θ defined above with as its estimated standard error and H0 is rejected if |Z| exceeds zα/2, the upper α/2 cut-off point of the standard normal distribution. Of course, if the alternative is one-sided, namely, H2 : θ > 0, then H0 is rejected if Z exceeds zα, the upper α cut-off point of the standard normal distribution. Again, if one is interested in constructing confidence intervals for θ, it is evident that, in large samples, the individual confidence intervals are given by

(2.11) 

Example 2.1. We use a data set from Hedges and Olkin (1985, p. 17) where a set of seven studies deals with sex differences in cognitive abilities. Originally, studies on sex differences in four cognitive abilities (quantitative, verbal, visual-spatial, and field articulation) were considered. We use only the effect size estimates derived from studies with quantitative ability. The relevant data are basically reproduced in Table 2.1 and include the total sample size (N) of each study, estimates of Hedges’s g, and unbiased effect size estimates g*. For further information about the studies let us refer to Hedges and Olkin (1985).

Table 2.1 Studies of gender difference in quantitative ability

For each study above, we can carry out the test for H0 : θ = 0 versus H1 : θ ≠ 0 as well as construct a confidence interval for θ based on the above discussion. Thus, for study 1, using the standardized mean difference g (Hedges’s g) = 0.72 and assuming n1 = n2 = 38, we get

and hence reject H0 with α = 0.05. Moreover, based on Eq. (2.11), the 95% confidence interval (CI) for θ is obtained as [0.256, 1.184]. It may be noted that the conclusions based on g* = 0.71 are the same. All the 95% confidence intervals for the seven studies are summarized in the last column of Table 2.1.

When the analysis is to be carried out on the original metric, the difference of μ1 and μ2, sometimes called the absolute difference between means, is the appropriate measure. The difference between means may be easier to interpret than the dimensionless standardized mean difference. The difference of the sample means, 1 − 2, is an unbiased estimator of the parameter of interest in this situation with variance σ21/n1 + σ22/n2. By plugging in the sample variances, the estimated variance of 1 − 2 is S21/n1 + S22/n2.

2.2 EFFECT SIZE BASED ON PROPORTIONS

An effect size θ based on proportions is derived as follows. Denote the population proportions of the two groups (experimental and control) by π1 and π2. One measure θ1 of the effect size θ is then given by

(2.12) 

which is simply the difference between the two population proportions.

A second measure θ2 of θ based on Fisher’s variance-stabilizing transformation (of a sample proportion) is defined as

(2.13) 

A third measure θ3 of θ, commonly known as the rate ratio, also called relative risk or risk ratio, is given by

(2.14) 

The measures θ1 and θ2 are such that the value 0 indicates no difference, while for the measure θ3, the value 1 indicates no difference. Often θ*3 = In θ3, which is the natural logarithm of θ3, is used so that the same value 0 indicates no difference in all three cases. The above measures of θ can be easily estimated. Suppose a random sample of size n1 from the first population yields a count of X1 for the attribute under study while a random sample of size n2 from the second population yields a count of X2. Then, if p1 = X1/n1 and p2 = X2/n2 denote the two sample proportions, estimates of θ are obtained as