22,99 €
Exploring the influence and application of Campbellian validitytypology in the theory and practice of outcome evaluation, thisvolume addresses the strengths and weaknesses of this oftencontroversial evaluation method and presents new perspectives forits use. Editors Huey T. Chen, Stewart I. Donaldson and Melvin M. Markprovide a historical overview of the Campbellian typology adoption,contributions and criticism. Contributing authors proposestrategies in developing a new perspective of validity typology foradvancing validity in program evaluation including * Enhance External Validity * Enhance Precision by Reclassifying the CampbellianTypology * Expand the Scope of the Typology The volume concludes with William R. Shadish's spirited rebuttalto earlier chapters. A collaborator with Don Campbell, Shadishprovides a balance to the perspective of the issue with aclarification and defense of Campbell's work. This is the 129th volume of the Jossey-Bass quarterly reportseries New Directions for Evaluation, an officialpublication of the American Evaluation Association.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 243
Veröffentlichungsjahr: 2011
Contents
Editors’ Notes
Chapter 1: Validity Frameworks for Outcome Evaluation
Campbellian Validity Typology
Content of the Campbellian Validity Framework
The Campbellian Validity Typology and Program Evaluation
Critiques of Campbellian Typology
Relationship Between the Campbellian Typology and Program Evaluation
Toward a New Perspective of a Comprehensive Validity Typology for Program Evaluation
Chapter 2: What Works for Whom, Where, Why, for What, and When? Using Evaluation Evidence to Take Action in Local Contexts
Introduction
What Works? What Do You Mean?
The Traditional Validity Framework
Applicability of the Traditional Framework to Modern Evaluation Practice
Toward a More Systematic Process for Making Predictions
Conclusions
Chapter 3: New (and Old) Directions for Validity Concerning Generalizability
Generalizability and External Validity
Enhancing Knowledge About Generalizability in the Campbellian Tradition
Diverging Traditions in Evaluation
Recommendations for Future Practice and for Areas of Future Development
Chapter 4: Criticisms of and an Alternative to the Shadish, Cook, and Campbell Validity Typology
The Context for Evaluating Validity
An Alternative Typology
Comparison With the SCC Typology
Conclusion
Chapter 5: Reframing Validity in Research and Evaluation: A Multidimensional, Systematic Model of Valid Inference
Logic of Valid Inference in the Campbellian Framework
Enhancing Coverage of Framework for Valid Inference
Supporting Conceptual Organization
Summary of Advantages of Dimensional Organization
Discussion
Chapter 6: Conflict of Interest and Campbellian Validity
Campbell and Stanley’s Conception of Validity
Revised Conceptions
Including Conflict-of-Interest Threats
Remedies
Summary
Chapter 7: The Construct(ion) of Validity as Argument
Making Interpretive Sense of Outcome Evaluation
Validity as Argument
Familiar and Unfamiliar Validities
Reprise
Chapter 8: Assessing Program Outcomes From the Bottom-Up Approach: An Innovative Perspective to Outcome Evaluation
The Top-Down Approach to Validity Issues
Lessons Learned From Applying the Top-Down Approach in Program Evaluation
The Integrative Validity Model as an Alternative Typology to Address Validity Issues
The Bottom-Up Approach for Evaluating Health Promotion/Social Betterment Programs
The Usefulness of the New Perspective for Program Evaluation
Chapter 9: The Truth About Validity
Chen, Donaldson, and Mark
Gargani and Donaldson
Mark
Reichardt
Julnes
House
Greene
Chen and Garbe
Discussion
Conclusion
Index
Advancing Validity in Outcome Evaluation: Theory and Practice
Huey T. Chen, Stewart I. Donaldson, Melvin M. Mark (eds.)
New Directions for Evaluation, no. 130
Sandra Mathison, Editor-in-Chief
Copyright ©2011 Wiley Periodicals, Inc., A Wiley Company, and the American Evaluation Association. All rights reserved. No part of this publication may be reproduced in any form or by any means, except as permitted under sections 107 and 108 of the 1976 United States Copyright Act, without either the prior written permission of the publisher or authorization through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923; (978) 750-8400; fax (978) 646-8600. The copyright notice appearing at the bottom of the first page of a chapter in this journal indicates the copyright holder’s consent that copies may be made for personal or internal use, or for personal or internal use of specific clients, on the condition that the copier pay for copying beyond that permitted by law. This consent does not extend to other kinds of copying, such as copying for general distribution, for advertising or promotional purposes, for creating collective works, or for resale. Such permission requests and other permission inquiries should be addressed to the Permissions Department, c/o John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030; (201) 748-6011, fax (201) 748-6008, www.wiley.com/go/permissions.
Microfilm copies of issues and articles are available in 16mm and 35mm, as well as microfiche in 105mm, through University Microfilms Inc., 300 North Zeeb Road, Ann Arbor, MI 48106-1346.
New Directions for Evaluation is indexed in Cambridge Scientific Abstracts (CSA/CIG), Contents Pages in Education (T & F), Higher Education Abstracts (Claremont Graduate University), Social Services Abstracts (CSA/CIG), Sociological Abstracts (CSA/CIG), and Worldwide Political Sciences Abstracts (CSA/CIG).
New Directions for Evaluation (ISSN 1097-6736, electronic ISSN 1534-875X) is part of The Jossey-Bass Education Series and is published quarterly by Wiley Subscription Services, Inc., A Wiley Company, at Jossey-Bass, 989 Market Street, San Francisco, CA 94103-1741.
Subscriptions cost $89 for U.S./Canada/Mexico; $113 international. For institutions, agencies, and libraries, $271 U.S.; $311 Canada/Mexico; $345 international. Prices subject to change.
Editorial correspondence should be addressed to the Editor-in-Chief, Sandra Mathison, University of British Columbia, 2125 Main Mall, Vancouver, BC V6T 1Z4, Canada.
www.josseybass.com
New Directions for Evaluation
Sponsored by the American Evaluation Association
Editor-in-Chief
Sandra MathisonUniversity of British ColumbiaAssociate Editors
Saville KushnerPatrick McKnightPatricia RogersUniversity of the West of EnglandGeorge Mason UniversityRoyal Melbourne Institute of TechnologyEditorial Advisory Board
Michael BambergerGail BarringtonNicole BowmanHuey ChenLois-ellin DattaStewart I. DonaldsonMichael DuttweilerJody FitzpatrickGary HenryStafford HoodGeorge JulnesJean KingNancy KingsburyHenry M. LevinLaura LevitonRichard LightLinda MabryCheryl MacNeilAnna MadisonMelvin M. MarkDonna MertensRakesh MohanMichael MorrisRosalie T. TorresElizabeth WhitmoreMaria Defino WhitsettBob WilliamsDavid B. WilsonNancy C. ZajanoIndependent consultantBarrington Research Group Inc.Bowman ConsultingUniversity of Alabama at BirminghamDatta AnalysisClaremont Graduate UniversityCornell UniversityUniversity of Colorado at DenverUniversity of North Carolina, Chapel HillArizona State UniversityUtah State UniversityUniversity of MinnesotaUS Government Accountability OfficeTeachers College, Columbia UniversityRobert Wood Johnson FoundationHarvard UniversityWashington State University, VancouverSage CollegeUniversity of Massachusetts, BostonThe Pennsylvania State UniversityGallaudet UniversityIdaho State LegislatureUniversity of New HavenTorres Consulting GroupCarleton UniversityAustin Independent School DistrictIndependent consultantUniversity of Maryland, College ParkLearning Point AssociatesEditorial Policy and Procedures
New Directions for Evaluation, a quarterly sourcebook, is an official publication of the American Evaluation Association. The journal publishes empirical, methodological, and theoretical works on all aspects of evaluation. A reflective approach to evaluation is an essential strand to be woven through every issue. The editors encourage issues that have one of three foci: (1) craft issues that present approaches, methods, or techniques that can be applied in evaluation practice, such as the use of templates, case studies, or survey research; (2) professional issues that present topics of import for the field of evaluation, such as utilization of evaluation or locus of evaluation capacity; (3) societal issues that draw out the implications of intellectual, social, or cultural developments for the field of evaluation, such as the women’s movement, communitarianism, or multiculturalism. A wide range of substantive domains is appropriate for New Directions for Evaluation; however, the domains must be of interest to a large audience within the field of evaluation. We encourage a diversity of perspectives and experiences within each issue, as well as creative bridges between evaluation and other sectors of our collective lives.
The editors do not consider or publish unsolicited single manuscripts. Each issue of the journal is devoted to a single topic, with contributions solicited, organized, reviewed, and edited by a guest editor. Issues may take any of several forms, such as a series of related chapters, a debate, or a long article followed by brief critical commentaries. In all cases, the proposals must follow a specific format, which can be obtained from the editor-in-chief. These proposals are sent to members of the editorial board and to relevant substantive experts for peer review. The process may result in acceptance, a recommendation to revise and resubmit, or rejection. However, the editors are committed to working constructively with potential guest editors to help them develop acceptable proposals.
Sandra Mathison, Editor-in-Chief
University of British Columbia
2125 Main Mall
Vancouver, BC V6T 1Z4
CANADA
e-mail: [email protected]
Editors’ Notes
Disclaimer: The findings and conclusions of this article are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention (CDC).
Decades ago, Suchman (1967) encouraged evaluators to apply Campbell and Stanley’s (Campbell & Stanley, 1963) writings on experiments, quasi-experiments, and validity to evaluation. Since that time, the Campbellian validity typology, as presented in Campbell and Stanley (1963), Cook and Campbell (1979), and Shadish, Cook, and Campbell (2002), has been prominent in much of the theory and practice of outcome evaluation. Despite its influence, the Campbellian validity typology and its associated methods have been criticized, sometimes generating heated debates on the typology’s strengths and weaknesses for evaluation. For some readers such debates might form part of this issue’s subtext; for others the issue should still be of interest—even to evaluators new to the field and unfamiliar with such debates. Validity frameworks are important. They can inform thinking about evaluation, guide evaluation practice, and facilitate future development of evaluation theory and methods.
This issue had its origins in a panel at the 2008 conference of the American Evaluation Association. Led by Huey T. Chen, the session focused on theory and practice as related to external validity in evaluation. The session was motivated in part by the sense that new directions, and perhaps increased attention to some old directions, are needed to reach meaningful conclusions about evaluation generalizability. But session presenters addressed issues related to validity forms beyond external validity. In addition, as planning shifted from the conference session to this issue, newly added contributors planned to address issues other than external validity. As a result, after considering alternative framings, the issue has evolved to its theme, that is, validity in the context of outcome evaluation.
The primary focus of most of the chapters is not on Campbell and colleagues’ validity typology per se, but rather on its application in the context of outcome evaluation. According to the Program Evaluation Standards (Joint Committee on Standards for Educational Evaluation, 1994), four attributes are essential for evaluation practice: utility, feasibility, propriety, and accuracy. The Campbellian typology offers clear strengths in addressing accuracy. However, it is less suited to address issues of utility, propriety, and feasibility. Perhaps a worthwhile direction for developing a comprehensive validity perspective for evaluation is to build on the Campbellian typology in ways that will better address issues related to all four attributes. This issue of New Directions for Evaluation is organized and developed under this spirit.
In general, we take the stance that we can further advance validity in outcome evaluation by revising or expanding the Campbellian typology. Chapter authors present multiple views on how to build on the Campbellian typology’s contribution and suggest alternative validity frameworks or models to serve program evaluation better. We hope that these new perspectives will advance theory and practice regarding validity in evaluation as well as improve the quality and usefulness of outcome evaluations.
Chapter authors propose the following strategies in developing a new perspective of validity typology for advancing validity in program evaluation.
Enhance External Validity
John Gargani and Stewart I. Donaldson, then Melvin M. Mark, focus on external validity. Gargani and Donaldson discuss limits of the Campbellian tradition regarding external validity. They argue that the external validity of an evaluation could be enhanced by better addressing issues about what works for whom, where, why, and when. Mark reviews several alternative framings of generalizability issues. With the use of these alternatives, he provides potentially fruitful directions for external validity enhancement.
Enhance Precision by Reclassifying the Campbellian Typology
The chapters by Charles S. Reichardt and George Julnes offer conceptual revisions of the Campbellian typology. Reichardt offers what he sees as flaws in the four types of validity in Shadish et al. (2002). He also offers his version of a typology, which includes four criteria: validity, precision, generalizability, and completeness. Julnes proposes a validity framework with three dimensions—representation (construct validity), causal inference (internal and external validity), and valuation. He argues for the conceptual and pragmatic merits of this framework.
Expand the Scope of the Typology
Ernest R. House discusses the Campbellian typology’s limitations in dealing with ethical challenges with which evaluation is increasingly faced. He notes an alarming phenomenon, visible in medical evaluations but increasingly worrisome in other areas of evaluation practice, whereby evaluation results become biased because of researchers’ intentional and unintentional manipulation. House discusses strategies for dealing with this ethical problem, including how these ethics-related problems might be incorporated within the Campbellian validity tradition.
Jennifer C. Greene is one of the few contributors to this issue who is not affiliated with the Campbellian tradition. She provides a naturalistic viewpoint in examining limits of the Campbellian typology. She discusses different validity concepts and offers strategies for strengthening validity that are not primarily associated with the Campbellian tradition. At the same time, her comments are congenial to advances within the framework provided by Campbell and colleagues. Huey T. Chen and Paul Garbe argue that outcome evaluation should address system-integration issues that go beyond the scope of goal attainment. The Campbellian typology’s strength is goal-attainment assessment. To address both goal-attainment and system-integration issues, these authors propose a validity model with three categories: viable, effectual, and transferable. With this expanded typology, they propose a bottom-up approach with the use of quantitative and qualitative methods to strengthen validity in an evaluation.
William R. Shadish, a collaborator of Campbell’s who played a key role in expanding the Campbellian typology (Shadish et al., 2002), offers his perspective on the contributions of this issue. Other chapters in the issue discuss various aspects of the Campbellian typology, with the authors representing varying degrees of closeness or distance to the tradition. Shadish speaks as an involved and interested representative of this tradition, which he upholds with vigor, thus providing balance to the perspectives in the issue. Shadish clarifies and defends the work of Campbell and his colleagues, offers themes related to the issue topic, and comments on the rest of the chapters.
Shadish takes exception with many of the arguments in the other chapters, countering our view that the typology must be revised or expanded to serve program evaluation better. Our hope is that the interplay among the ideas in all of the chapters will provide readers with multiple viewpoints as well as stimulate future development in this important area. Don Campbell advocated a “disputatious community of scholars” to create self-correcting processes. He appended critiques of his papers by others to his own reprints. In this spirit, we include Shadish’s comments and hope this will contribute to evaluators’ thinking and practice regarding validity and outcome evaluation.
References
Campbell, D. T., & Stanley, J. C. (1963). Experimental and quasi-experimental designs for research on teaching. In N. L. Gage (Ed.), Handbook of research on teaching (pp. 171–246). Chicago, IL: Rand McNally. Also published as Campbell, D. T., & Stanley, J. C. (1966). Experimental and quasi-experimental designs for research. Chicago, IL: Rand McNally. Since reprinted as Campbell, D. T., & Stanley, J. (1963). Experimental and quasi-experimental designs for research. Boston, MA: Houghton-Mifflin/Wadsworth.
Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design and analysis issues for field settings. Chicago, IL: Rand McNally.
Joint Committee on Standards for Educational Evaluation. (1994). The program evaluation standards (2nd ed.). Thousand Oaks, CA: Sage.
Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston, MA: Houghton Mifflin.
Suchman, E. A. (1967). Evaluation research. New York, NY: Russell Sage Foundation.
Huey T. Chen
Stewart I. Donaldson
Melvin M. Mark
Editors
Huey T. Chen is a senior evaluation scientist of the Air Pollution and Respiratory Health Branch at the Centers for Disease Control and Prevention (CDC).
Stewart I. Donaldson is dean and professor of psychology at the Claremont Graduate University.
Melvin M. Mark is professor and head of psychology at Penn State University.
Chapter 1
Validity Frameworks for Outcome Evaluation
Huey T. Chen1, Stewart I. Donaldson2, Melvin M. Mark3
Chen, H. T., Donaldson, S. I., & Mark, M. M. (2011). Validity frameworks for outcome evaluation. In H. T. Chen, S. I. Donaldson, & M. M. Mark (Eds.), Advancing validity in outcome evaluation: Theory and practice. New Directions for Evaluation, 130, 5–16.
Disclaimer: The findings and conclusions of this article are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention (CDC).
Abstract
This chapter discusses the concept of validity as it applies to outcome evaluation. We address the historical adoption and contributions of the Campbellian typology to evaluation. We also discuss related criticisms and controversies and address future directions. © Wiley Periodicals, Inc., and the American Evaluation Association.
How does an evaluator conclude that a program works? How skeptically should a potential evaluation consumer view a summary statement such as “The Web-based supplementary instruction program increased mathematics performance by the equivalent of 3 months of regular instruction”? And how skeptically should that same evaluation consumer view the conclusion that similar effects would occur at her school district? From one perspective at least, the concept of validity lies at the core of all such questions.
Campbellian Validity Typology
In discussing the Campbellian validity typology, it may be important to start with what the typology is not. The term validity has been broadly applied, with test validity perhaps the most common usage (Lissitz, 2009; Messick, 1989). Psychometricians, education measurement specialists, and practitioners in areas ranging from personnel selection to compensatory education care mightily about whether, for example, a 25-item multiple-choice test is valid. In a test-validity context, alternative conceptualizations of validity exist (Lissitz, 2009). The classic view is that test validity refers to the extent to which a test measures what it is supposed to measure. A more recent conceptualization of test validity argues that validity refers to the extent to which any interpretations and actions based on the test are justified.
The focus of this chapter is not on test validity, but rather on the forms of validity that occur in the Campbellian typology and in outcome evaluation. Accordingly, this issue focuses on validity issues that arise when evaluators investigate the relationship between an intervention (e.g., a health promotion or social service program) and the potential outcomes of interest (e.g., reduced childhood obesity or increased employment). Program evaluation commonly requires convincing and defensible evidence about the outcomes of an intervention. In this regard, the Campbellian validity typology has attracted evaluators’ attention. It provides a conceptual framework for thinking about evaluation design and certain kinds of challenges, and it highlights potential strengths and weaknesses of methods that evaluators might use to address validity issues in an outcome evaluation.
As with this entire issue, this introductory chapter focuses on validity in the context of what are varyingly referred to as outcome or impact evaluations. In this introductory chapter, we provide an overview of the Campbellian validity typology. This overview is designed to enhance the ability to read and benefit from the rest of this issue, especially for those less familiar with the Campbellian framework. Readers already familiar with the validity framework of Campbell and his colleagues should also find points of interest.
Content of the Campbellian Validity Framework
We suggest that the Campbellian validity typology contains three general content areas: the nature of validity, the types of validity in the context of estimating treatment effects (and threats to each type of validity), and the principles or procedures for prioritization across validity types. Methods for addressing validity issues could also be viewed as a fourth content area of the validity typology, but alternatively can be seen as another, related component in a broader theory of validity and methods. We briefly and selectively address each of these areas. For a fuller discussion of the validity concept and methods, we encourage readers to consult the original sources.
Nature of Validity
Whether by evaluators or others, secondary discussions of Campbell’s validity typology, including the variants associated with co-authors Stanley, Cook, and Shadish and Cook, focus largely on the types of validity in the typology (and on the threats to each type of validity). Although more nuanced distinctions are available, it appears that validity definitions can take several different approaches, including:
1. Validity as the accuracy of an inference. For example, if an evaluation concludes that Early Head Start substantially increases school readiness, does that conclusion correspond to the actual (but not directly known) state of the world?
2. Validity as property of method/research design. For example, if a randomized experiment is taken as having excellent internal validity, then any well-conducted experimental evaluation is thought to have strong internal validity.
Campbell and Stanley open their discussion by saying that “In this chapter we shall examine the validity of sixteen experimental designs against twelve common threats to valid inference” (Campbell & Stanley, 1963, p. 1). Although not explicitly stated, it appears that some readers have taken such language as implying that validity is a property of method, specifically of a quasi-experimental or experimental design. By this interpretation, for instance, a one-group, pretest–posttest design is taken as weak with respect to internal validity because threats including maturation, history, and testing are all plausible in general. Interpreting validity as a design property does not comport well with the broader body of Campbell’s writing, but appears common in many descriptions written by others about the validity framework of Campbell and colleagues.
Generally speaking, Cook and Campbell took a validity-as-accuracy perspective. They stated that “We shall use the concepts validity and invalidity to refer to the best available approximation to the truth or falsity of propositions” (Cook and Campbell, 1979, p. 39). And they indicated that factors other than design are important in tentatively establishing validity claims (e.g., qualitative investigation may clarify whether history threats occurred in a quasi-experiment). Shadish, Cook, and Campbell (2002, p. 33) reported they “use the term validity to refer to the approximate truth of an inference,” and they further indicated that their concept of validity is informed by both correspondence and coherence conceptions of truth, as well as pragmatism (pp. 35–37). This more recent definition of validity seems congruent with Campbell’s work in general, including his emphasis on the fallibility of all knowledge claims, concepts such as plausible alternative explanations and validity threats, and the logic of ruling out plausible alternative interpretations. In contrast, the validity-as-property-of method notion, while implicit in some writing about Campbell, does not fit well with his broader body of work (see the Shadish, Cook, & Leviton, 1991, detailed summary of Campbell’s writing).
Types of Validity
To date, the Campbellian typology has appeared in three major versions. Campbell and Stanley (1963) offered the initial version. They proposed a distinction between internal validity and external validity. They defined internal validity in terms of whether “. . . in fact the experimental treatments make a difference in this specific experimental instance?” Campbell and Stanley further specified that external validity asks the question of generalizability: “To what populations, settings, treatment variables, and measurement variables can this effect be generalized?”
Cook and Campbell (1979) expanded on the Campbell and Stanley listing of validity types, identifying four types rather than two. Cook and Campbell (1979) subdivided internal validity into two types: statistical conclusion validity and internal validity. The former involves validity related to the question of what “conclusions about covariation are made on the basis of statistical evidence.” Internal validity involves the accuracy of the conclusion about whether there is a causal relationship between the treatment and the outcome(s), given the particular setting and participants observed and the particular methods employed. Similarly, Cook and Campbell also subdivided Campbell and Stanley’s external validity into two categories: construct validity and external validity. Construct validity involves the validity of conclusions about “what are the particular cause and effect constructs involved in the relationship?” External validity involves the question of “how generalizable is the causal relationship to and across persons, setting, and times?”
Shadish et al. (2002) used the same four validity types proposed by Cook and Campbell (1979). In the more recent version, the definitions of statistical conclusion and internal validity remain the same, whereas the definitions of construct and external validity are slightly modified. To Shadish et al. (2002), construct validity refers to the “validity of inference about the higher order constructs that represent sampling particulars.” By contrast, external validity refers to “whether the causal relationship holds over variation in persons, settings, treatments, and measurement variables.” That is, Cook and Campbell linked construct validity to treatment and outcomes, and external validity to persons and settings; in contrast, Shadish et al. (2002) extended both validities to each of the four facets of a study they highlight: persons, treatments, outcomes, and settings.
The validity concepts emanating from the evolving validity typology of Campbell and his associates have established a strong presence in the evaluation literature. The initial distinction between internal and external validity, popularized by Campbell and Stanley (1963), is widely used today. In this issue, as in the rest of the literature, authors differ in terms of which version they emphasize. The Reichardt (this issue), Julnes (this issue), and Mark (this issue) chapters focus on the Cook and Campbell (1979) and Shadish et al. (2002) versions, whereas the rest of the chapters primarily refer to the Campbell and Stanley version.
Principles of Prioritization
In addition to specifying (two or four) validity types, the Campbellian typology offers principles for prioritizing various validity types. Campbell and Stanley (1963) pointed out the frequent trade-offs between internal and external validity. That is, an increase in one may come only with a reduction of the other. For example, a limited set of sites may allow random assignment to condition (or the use of other design features that facilitate internal validity), and these sites may be unusual in ways that hinder generalization results to other sites.
For Campbell and Stanley (1963), the first priority is internal validity. For them, internal validity is the minimum requirement for any cause-probing study, without which research results are difficult to interpret. An oft-quoted statement of theirs is that “internal validity is the sine qua non.” Of course, although placing a priority on internal validity, Campbell and Stanley described both internal and external validity as important. After all, Campbell developed and helped to popularize the external validity concept, at least in part so that concerns about generalization would not be ignored.
Cook and Campbell (1979) offer a more nuanced discussion, highlighting, for example, that the Campbell and Stanley priority to internal validity assumes that a study’s purpose was to investigate whether a treatment affected an outcome. In the context of evaluation, then, Campbell and Stanley’s prioritization of internal validity would not apply to evaluations that have some other focus. Thus, claims about so-called gold-standard methods, without caveats about evaluation purpose, seem inconsistent with a careful reading of Campbell’s work.
More recently, Shadish et al. (2002, p. 98) advanced an even more nuanced view than Cook and Campbell about priorities across validity types. For example, they state that “internal validity is not the sine qua non of all research” and that it “has a special (but not inviolate) place in cause-probing research, especially in experimental research.” They also advocated for programs of research in which studies varied in their relative prioritization of different validity types, with each validity type having “its turn in the spotlight” across studies (p. 102).
Methods for Enhancing Validity
Another important validity-related contribution made by Campbell and his associates was their specification of various experimental and quasi-experimental designs for enhancing a study’s internal validity. They systematically illustrated how in general each design does or does not rule out threats to internal validity. Internal validity threats are categories of generic alternative explanations, such as history or maturation, which could alternatively account for what might appear to be an effect of the program on the outcome of interest. According to Campbell and his associates, randomized experiments are generally preferable for estimating a treatment effect because they typically can rule out most threats to internal validity (where “rule out” is a term commonly used for rendering a threat implausible). In spite of this preference, Campbell and colleagues cautioned that experimental methods should be applied thoughtfully. For example, an automatic priority for randomized controlled trials (RCTs) would likely ignore caveats from the Campbellian tradition, such as “Premature experimental work is a common research sin” (Shadish et al., 2002, p. 99). Quasi-experimental methods are generally a second-best choice, with a wide range across quasi-experimental designs in terms of how well they typically rule out the majority of threats to internal validity. Again, however, as Shadish et al. especially make clear, a one-to-one correspondence between research design and validity should not be assumed. Additional evidence can help strengthen or refute validity claims.
Although external validity was originally viewed as a lower priority than internal validity, Campbell and his associates made noteworthy contributions to our understanding of external validity. Of course, developing and popularizing the concept itself were major accomplishments. Moreover, Campbell and colleagues increasingly offered methods and principles for enhancing external validity. Cook and Campbell (1979) discussed alternative methods for sampling, for example, including purposive sampling of different kinds of cases (to see if findings held across the differences in question) or sampling “modal instances,” cases most like those to which generalization is desired. Shadish et al. (2002) also offered principles for enhancing external validity. They discussed enhancing external validity by surface similarity, ruling out irrelevancies, making discriminations, interpolation, and causal explanation. Despite these contributions, much of the conversation about enhancing external validity is at a different level conceptually from that of internal validity. And contributors to the evaluation literature have done much less than might be desired in terms of applying, revising, and adding to the methods and principles for enhancing external validity.
The Campbellian Validity Typology and Program Evaluation
