70,99 €
Presents a comprehensive treatment of issues related to the inception, design, implementation and reporting of large-scale education assessments.
In recent years many countries have decided to become involved in international educational assessments to allow them to ascertain the strengths and weaknesses of their student populations. Assessments such as the OECD's Programme for International Student Assessment (PISA), the IEA's Trends in Mathematics and Science Study (TIMSS) and Progress in International Reading Literacy (PIRLS) have provided opportunities for comparison between students of different countries on a common international scale.
This book is designed to give researchers, policy makers and practitioners a well-grounded knowledge in the design, implementation, analysis and reporting of international assessments. Readers will be able to gain a more detailed insight into the scientific principles employed in such studies allowing them to make better use of the results. The book will also give readers an understanding of the resources needed to undertake and improve the design of educational assessments in their own countries and regions.
Implementation of Large-Scale Education Assessments:
Survey researchers, market researchers and practitioners engaged in comparative projects will all benefit from the unparalleled breadth of knowledge and experience in large-scale educational assessments gathered in this one volume.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 798
Veröffentlichungsjahr: 2017
Cover
Title Page
Notes on Contributors
Foreword
Acknowledgements
Abbreviations
1 Implementation of Large‐Scale Education Assessments
1.1 Introduction
1.2 International, Regional and National Assessment Programmes in Education
1.3 Purposes of LSAs in Education
1.4 Key Areas for the Implementation of LSAs in Education
1.5 Summary and Outlook
Appendix 1.A
References
2 Test Design and Objectives
2.1 Introduction
2.2 PISA
2.3 TIMSS
2.4 PIRLS and Pre‐PIRLS
2.5 ASER
2.6 SACMEQ
2.7 Conclusion
References
3 Test Development
3.1 Introduction
3.2 Developing an Assessment Framework: A Collaborative and Iterative Process
3.3 Generating and Collecting Test Material
3.4 Refinement of Test Material
3.5 Beyond Professional Test Development: External Qualitative Review of Test Material
3.6 Introducing Innovation
3.7 Conclusion
References
4 Design, Development and Implementation of Contextual Questionnaires in Large‐Scale Assessments
4.1 Introduction
4.2 The Role of Questionnaires in LSAs
4.3 Steps in Questionnaire Design and Implementation
4.4 Questions and Response Options in LSAs
4.5 Alternative Item Formats
4.6 Computer‐Based/Online Questionnaire Instruments
4.7 Conclusion and Future Perspectives
Acknowledgements
References
5 Sample Design, Weighting, and Calculation of Sampling Variance
5.1 Introduction
5.2 Target Population
5.3 Sample Design
5.4 Weighting
5.5 Sampling Adjudication Standards
5.6 Estimation of Sampling Variance
References
6 Translation and Cultural Appropriateness of Survey Material in Large‐Scale Assessments
6.1 Introduction
6.2 Overview of Translation/Adaptation and Verification Approaches Used in Current Multilingual Comparative Surveys
6.3 Step‐by‐Step Breakdown of a Sophisticated Localisation Design
6.4 Measuring the Benefits of a Good Localisation Design
6.5 Checklist of Requirements for a Robust Localisation Design
References
7 Quality Assurance
7.1 Introduction
7.2 The Development and Agreement of Standardised Implementation Procedures
7.3 The Production of Manuals which Reflect Agreed Procedures
7.4 The Recruitment and Training of Personnel in Administration and Organisation: Especially the Test Administrator and the School Coordinator
7.5 The Quality Monitoring Processes: Recruiting and Training Quality Monitors to Visit National Centres and Schools
7.6 Other Quality Monitoring Procedures
7.7 Conclusion
Reference
8 Processing Responses to Open‐Ended Survey Questions
8.1 Introduction
8.2 The Fundamental Objective
8.3 Contextual Factors: Survey Respondents and Items
8.4 Administration of the Coding Process
8.5 Quality Assurance and Control: Ensuring Consistent and Reliable Coding
8.6 Conclusion
References
9 Computer‐Based Delivery of Cognitive Assessment and Questionnaires
9.1 Introduction
9.2 Why Implement CBAs?
9.3 Implementation of International Comparative CBAs
9.4 Assessment Architecture
9.5 Item Design Issues
9.6 State‐of‐the‐Art and Emerging Technologies
9.7 Summary and Conclusion
References
10 Data Management Procedures
10.1 Introduction
10.2 Historical Review: From Data Entry and Data Cleaning to Integration into the Entire Study Process
10.3 The Life Cycle of a LSA Study
10.4 Standards for Data Management
10.5 The Data Management Process
10.6 Outlook
References
11 Test Implementation in the Field: The Case of PASEC
11.1 Introduction
11.2 Test Implementation
11.3 Data Entry
11.4 Data Cleaning
11.5 Data Analysis
11.6 Governance and Financial Management of the Assessments
Acknowledgments
References
12 Test Implementation in the Field: The Experience of Chile in International Large‐Scale Assessments
12.1 Introduction
12.2 International Studies in Chile
References
13 Why Large‐Scale Assessments Use Scaling and Item Response Theory
13.1 Introduction
13.2 Item Response Theory
13.3 Test Development and Construct Validation
13.4 Rotated Test Booklets
13.5 Comparability of Scales Across Settings and Over Time
13.6 Construction of Performance Indicators
13.7 Conclusion
References
14 Describing Learning Growth
14.1 Background
14.2 Terminology: The Elements of a
Learning Metric
14.3 Example of a Learning Metric
14.4 Issues for Consideration
14.5 PISA Described Proficiency Scales
14.6 Defining and Interpreting Proficiency Levels
14.7 Use of Learning Metrics
Acknowledgement
References
15 Scaling of Questionnaire Data in International Large‐Scale Assessments
15.1 Introduction
15.2 Methodologies for Construct Validation and Scaling
15.3 Classical Item Analysis
15.4 Exploratory Factor Analysis
15.5 Confirmatory Factor Analysis
15.6 IRT Scaling
15.7 Described IRT Questionnaire Scales
15.8 Deriving Composite Measures of Socio‐economic Status
15.9 Conclusion and Future Perspectives
References
16 Database Production for Large‐Scale Educational Assessments
16.1 Introduction
16.2 Data Collection
16.3 Cleaning, Recoding and Scaling
16.4 Database Construction
16.5 Assistance
References
17 Dissemination and Reporting
17.1 Introduction
17.2 Frameworks
17.3 Sample Items
17.4 Questionnaires
17.5 Video
17.6 Regional and International Reports
17.7 National Reports
17.8 Thematic Reports
17.9 Summary Reports
17.10 Analytical Services and Support
17.11 Policy Papers
17.12 Web‐Based Interactive Display
17.13 Capacity‐Building Workshops
17.14 Manuals
17.15 Technical Reports
17.16 Conclusion
References
Index
End User License Agreement
Chapter 02
Table 2.1 Cluster rotation design used to form standard test booklets for PISA 2012
Table 2.2 TIMSS 2015 booklet design for fourth and eighth grades
Table 2.3 TIMSS 2015 framework characteristics for fourth and eighth grade mathematics
Table 2.4 Blueprint for the PIRLS and pre‐PIRLS assessments
Table 2.5 ASER reading and arithmetic assessment task descriptions
Table 2.6 The test blueprint for the SACMEQ II pupil mathematics test
Chapter 03
Table 3.1 Blueprint for numeracy content areas in PIAAC
Chapter 04
Table 4.1 Questionnaire content
Table 4.2 Final design of rotated student context questionnaires in the PISA2012 MS
Table 4.3 Unforeseen sources of error and example reactive probes
Table 4.4 Number of domains on top bookshelf by year level
Chapter 08
Table 8.1 Examples of an initially lenient result and a neutral result
Table 8.2 Examples of flagged cases in a country
Table 8.3 Hypothetical examples of percentages of flagged cases for one booklet
Chapter 09
Table 9.1 Rotated cluster design, PISA 2012 CBA
Chapter 11
Table 11.1 Reading objectives, subdomains and materials in the PASEC
2014
assessment
Table 11.2 Cognitive processes in the PASEC
2014
assessment
Table 11.3 Mathematical content measured by the PASEC
2014
assessment
Table 11.4 Assessed cognitive processes
Table 11.5 Allocation of item blocks across test booklets in PASEC
2014
Table 11.6 Language test organisation
Table 11.7 Mathematics test organisation
Chapter 12
Table 12.1 International studies of educational assessment in Chile (1998–2016)
Chapter 13
Table 13.1 Example of item statistics for a multiple‐choice test item with four response options
Table 13.2 Example of item statistics for a partial credit item
Table 13.3 TIMSS 2015 student achievement booklet design
Chapter 15
Table 15.1 Socio‐economic indicators in major international studies
Chapter 16
Table 16.1 International Standard Classification of Education (ISCED) categories (selected)
Chapter 01
Figure 1.1 Simplified model of the policy cycle
Figure 1.2 Key areas of a robust assessment programme
Chapter 02
Figure 2.1 Mapping of the PIRLS comprehension processes to the PISA reading aspects
Figure 2.2 ASER sample reading assessment instrument (English)
Chapter 04
Figure 4.1 Number of books at home item – TIMSS and PIRLS Year 4
Figure 4.2 Examples of forced‐choice items in PISA 2012 (ST48Q02 and ST48Q05)
Figure 4.3 Examples of situational judgement type items in PISA 2012 (ST104Q01, ST104Q04, ST104Q05, ST104Q06)
Figure 4.4 Example of ‘over‐claiming technique’ type question in PISA 2012 (ST62Q01‐ST62Q19)
Figure 4.5 The ‘who is close to me’ item from the Australian Child Wellbeing Project
Figure 4.6 The ‘bookshelf item’ from the Australian Child Wellbeing Project
Chapter 06
Figure 6.1 Images of a book for left‐to‐right and right‐to‐left languages
Chapter 08
Figure 8.1 Two examples of survey questions with their response coding instructions
Figure 8.2 A PISA item that aims to encapsulate all possible responses to an open‐ended question
Chapter 09
Figure 9.1 Cluster administration order
Figure 9.2 A progress indicator from a CBA
Figure 9.3 A timing bar from a CBA
Chapter 10
Figure 10.1 Flow of data
Chapter 13
Figure 13.1 Example of item characteristic curve for test item 88
Figure 13.2 Example of item characteristic curve for test item 107
Figure 13.3 Illustrative problematic test item
Figure 13.4 Item difficulty and achievement distributions
Figure 13.5 Comparison of the item pool information function for mathematics and the calibration sample proficiency distribution for a country participating in PISA 2009
Figure 13.6 DIF plot
Figure 13.7 Example item with gender DIF
Figure 13.8 Item S414Q04 PISA field trial 2006
Figure 13.9 An example of the item‐by‐country interaction report (item S414Q04, PISA 2006 field trial)
Figure 13.10 Trend oriented MTEG design
Figure 13.11 An example of the distribution of actual individual scale score for 5000 students
Figure 13.12 An example of the distribution of estimates of individual scale score for 5000 students
Chapter 14
Figure 14.1 Example learning metric for mathematics
Figure 14.2 Sample item allocated to the ‘number and algebra’ and ‘apply’ categories
Figure 14.3 Sample item allocated to the ‘number and algebra’ and ‘translate’ categories
Figure 14.4 Sample ACER ConQuest item map
Figure 14.5 What it might mean for an individual to ‘be at a level’ on a learning metric
Figure 14.6 Calculating the RP‐value used to define PISA proficiency levels (for dichotomous items)
Chapter 15
Figure 15.1 Category characteristic curves, for example, item with four categories
Figure 15.2 Expected item scores, for example, item with four categories
Figure 15.3 Accumulated category probabilities, for example, item with four categories
Figure 15.4 Example of ICCS 2009 item map to describe questionnaire items
Cover
Table of Contents
Begin Reading
ii
iii
iv
xvii
xviii
xix
xx
xvi
xxi
xxii
xxiii
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
The Wiley Series in Survey Methodology covers topics of current research and practical interests in survey methodology and sampling. While the emphasis is on application, theoretical discussion is encouraged when it supports a broader understanding of the subject matter.
The authors are leading academics and researchers in methodology and sampling. The readership includes professionals in, and students of, the fields of applied statistics, biostatistics, public policy, and government and corporate enterprises.
ALWIN ‐ Margins of Error: A Study of Reliability in Survey Measurement
BETHLEHEM ‐ Applied Survey Methods: A Statistical Perspective
BIEMER, LEEUW, ECKMAN, EDWARDS, KREUTER, LYBERG, TUCKER, WEST (EDITORS) ‐ Total Survey Error in Practice: Improving Quality in the Era of Big Data
BIEMER ‐ Latent Class Analysis of Survey Error
BIEMER and LYBERG ‐ Introduction to Survey Quality
CALLEGARO, BAKER, BETHLEHEM, GORITZ, KROSNICK, LAVRAKAS (EDITORS) ‐ Online Panel Research: A Data Quality Perspective
CHAMBERS and SKINNER (EDITORS) ‐ Analysis of Survey Data
CONRAD and SCHOBER (EDITORS) ‐ Envisioning the Survey Interview of the Future
COUPER, BAKER, BETHLEHEM, CLARK, MARTIN, NICHOLLS, O'REILLY (EDITORS) ‐ Computer Assisted Survey Information Collection
D'ORAZIO, DI ZIO, SCANU ‐ Statistical Matching: Theory and Practice
FULLER ‐ Sampling Statistics
GROVES, DILLMAN, ELTINGE, LITTLE (EDITORS) ‐ Survey Nonresponse
GROVES, BIEMER, LYBERG, MASSEY, NICHOLLS, WAKSBERG (EDITORS) ‐ Telephone Survey Methodology
GROVES AND COUPER ‐ Nonresponse in Household Interview Surveys
GROVES ‐ Survey Errors and Survey Costs
GROVES ‐ The Collected Works of Robert M. Groves, 6 Book Set
GROVES, FOWLER, COUPER, LEPKOWSKI, SINGER, TOURANGEAU ‐ Survey Methodology, 2nd Edition
HARKNESS, VAN DE VIJVER, MOHLER ‐ Cross‐Cultural Survey Methods
HARKNESS, BRAUN, EDWARDS, JOHNSON, LYBERG, MOHLER, PENNELL, SMITH (EDITORS) ‐ Survey Methods in Multicultural, Multinational, and Multiregional Contexts
HEDAYAT, SINHA ‐ Design and Inference in Finite Population Sampling
HUNDEPOOL, DOMINGO‐FERRER, FRANCONI, GIESSING, NORDHOLT, SPICER, DE WOLF ‐ Statistical Disclosure Control
KALTON, HEERINGA (EDITORS) ‐ Leslie Kish: Selected Papers
KORN, GRAUBARD ‐ Analysis of Health Surveys
KREUTER (EDITOR) ‐ Improving Surveys with Paradata: Analytic Uses of Process Information
LEPKOWSKI, TUCKER, BRICK, DE LEEUW, JAPEC, LAVRAKAS, LINK, SANGSTER ‐ Advances in Telephone Survey Methodology
LEVY, LEMESHOW ‐ Sampling of Populations: Methods and Applications, 4th Edition
LIETZ, CRESSWELL, RUST, ADAMS (EDITORS) ‐ Implementation of Large‐Scale Education Assessments
LUMLEY ‐ Complex Surveys: A Guide to Analysis Using R
LYNN (EDITOR) ‐ Methodology of Longitudinal Surveys
MADANS, MILLER, MAITLAND, WILLIS ‐ Question Evaluation Methods: Contributing to the Science of Data Quality
MAYNARD, HOUTKOOP‐STEENSTRA, SCHAEFFER, VAN DER ZOUWEN (EDITORS) ‐ Standardization and Tacit Knowledge: Interaction and Practice in the Survey Interview
MILLER, CHEPP, WILLSON, PADILLA (EDITORS) ‐ Cognitive Interviewing Methodology
PRATESI (EDITOR) ‐ Analysis of Poverty Data by Small Area Estimation
PRESSER, ROTHGEB, COUPER, LESSLER, E. MARTIN, J. MARTIN, SINGER ‐ Methods for Testing and Evaluating Survey Questionnaires
RAO, MOLINA ‐ Small Area Estimation, 2nd Edition
SÄRNDAL, LUNDSTRÖM ‐ Estimation in Surveys with Nonresponse
SARIS, GALLHOFER ‐ Design, Evaluation, and Analysis of Questionnaires for Survey Research, 2nd Edition
SIRKEN, HERRMANN, SCHECHTER, SCHWARZ, TANUR, TOURANGEAU (EDITORS) ‐ Cognition and Survey Research
SNIJKERS, HARALDSEN, JONES, WILLIMACK ‐ Designing and Conducting Business Surveys
STOOP, BILLIET, KOCH, FITZGERALD ‐ Improving Survey Response: Lessons Learned from the European Social Survey
VALLIANT, DORFMAN, ROYALL ‐ Finite Population Sampling and Inference: A Prediction Approach
WALLGREN, A., WALLGREN B. ‐ Register‐based Statistics: Statistical Methods for Administrative Data, 2nd Edition
WALLGREN, A., WALLGREN B. ‐ Register‐based Statistics: Administrative Data for Statistical Purposes
Edited by
Petra LietzJohn C. CresswellKeith F. RustRaymond J. Adams
This edition first published 2017© 2017 by John Wiley and Sons Ltd
Registered OfficeJohn Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom
For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com.
The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.
Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. It is sold on the understanding that the publisher is not engaged in rendering professional services and neither the publisher nor the author shall be liable for damages arising herefrom. If professional advice or other expert assistance is required, the services of a competent professional should be sought.
Library of Congress Cataloging‐in‐Publication Data
Names: Lietz, Petra, editor. | Cresswell, John, 1950– editor. | Rust, Keith, editor. | Adams, Raymond J., 1959– editor.Title: Implementation of large‐scale education assessments / editors, Petra Lietz, John C. Cresswell, Keith F. Rust, Raymond J. Adams.Other titles: Wiley Series in Survey MethodologyDescription: Chichester, UK ; Hoboken, NJ : John Wiley & Sons, 2017. | Series: Wiley Series in Survey Methodology | Includes bibliographical references and index.Identifiers: LCCN 2016035918 (print) | LCCN 2016050522 (ebook) | ISBN 9781118336090 (cloth) | ISBN 9781118762479 (pdf) | ISBN 9781118762493 (epub)Subjects: LCSH: Educational tests and measurements.Classification: LCC LB3051 .L473 2016 (print) | LCC LB3051 (ebook) | DDC 371.26–dc23LC record available at https://lccn.loc.gov/2016035918
A catalogue record for this book is available from the British Library.
Cover design by WileyCover image: ZaZa Studio/Shutterstock;(Map) yukipon/Gettyimages
Raymond J. AdamsAustralian Council for Educational Research
Alla BereznerAustralian Council for Educational Research
Falk BreseInternational Association for the Evaluation of Educational Achievement (IEA) Data Processing and Research Center
Mark CockleInternational Association for the Evaluation of Educational Achievement (IEA) Data Processing and Research Center
John C. CresswellAustralian Council for Educational Research
Steve DeptcApStAn Linguistic Quality Control
Andrea FerraricApStAn Linguistic Quality Control
Eveline GebhardtAustralian Council for Educational Research
Béatrice HalleuxHallStat
Oswald KoussihouèdéProgramme for the Analysis of Education Systems of CONFEMEN (PASEC)
Sheila KrawchukWestat
Ema Lagos CamposAgencia de Calidad de la Educación
Petra LietzAustralian Council for Educational Research
Antoine MarivinProgramme for the Analysis of Education Systems of CONFEMEN (PASEC)
Juliette MendelovitsAustralian Council for Educational Research
Christian MonseurUniversité de Liège
Dara RamalingamAustralian Council for Educational Research
Keith F. RustWestat
Wolfram SchulzAustralian Council for Educational Research
Vanessa SyProgramme for the Analysis of Education Systems of CONFEMEN (PASEC)
Ross TurnerAustralian Council for Educational Research
Maurice WalkerAustralian Council for Educational Research
Governments throughout the world recognise that the quality of schooling provided to children and young people will be an important determinant of a country’s social and economic success in the twenty‐first century. In every country, a central question is what governments and school systems can do to ensure that all students are equipped with the knowledge, skills and attributes necessary for effective participation in the future workforce and for productive future citizenship.
To answer this question, countries require quality information, including information on current levels of student achievement, the performances of subgroups of the student population − especially socio‐economically disadvantaged students, Indigenous students and new arrivals − and recent trends in achievement levels within a country. Also important is an understanding of how well a nation’s schools are performing in comparison with schools elsewhere in the world. Are some school systems producing better outcomes overall? Have some systems achieved superior improvements in achievement levels over time? Are some more effective in ameliorating the influence of socio‐economic disadvantage on educational outcomes? Are some doing a better job of developing the kinds of skills and attributes required for life and work in the twenty‐first century?
Some 60 years ago, a small group of educational researchers working in a number of countries conceived the idea of collecting data on the impact of countries’ educational policies and practices on student outcomes. With naturally occurring differences in countries’ school curricula, teaching practices, ways of organising and resourcing schools and methods of preparing and developing teachers and school leaders, they saw the possibility of studying the effectiveness of different educational policies and practices in ways that would be difficult or impossible in any one country. The cross‐national studies that these researchers initiated in the 1960s marked the beginning of large‐scale international achievement surveys.
In the decades since the 1960s, international comparative studies of student achievement and the factors underpinning differences in educational performance in different countries have evolved from a research interest of a handful or academics and educational research organisations to a major policy tool of governments across the globe. International surveys now include the OECD’s PISA implemented in 75 countries in 2015 and the IEA’s Trends in International Mathematics and Science Study implemented in 59 countries in 2015. Other international studies are conducted in areas such as primary school reading, civics and citizenship and ICT literacy. Complementing these international surveys are three significant regional assessment programmes, with a fourth under development. Governments use the results of these large‐scale international studies, often alongside results from their own national surveys, to monitor progress in improving quality and equity in school education and to evaluate the effectiveness of system‐wide policies and programmes.
The decades since the 1960s have also seen significant advances in methodologies for the planning, implementation and use of international surveys – in effect, the evolution of a science of large‐scale assessment.
This book maps an evolving methodology for large‐scale educational assessments. Advances in this field have drawn on advances in specific disciplines and areas of practice, including psychometrics, test development, statistics, sampling theory and the use of new technologies of assessment. The book identifies and discusses 13 elements of a complex, integrated science of large‐scale assessment – a methodology that begins with a consideration of the policy context and purpose of a study – proceeds through various steps in the design and implementation of a quality assessment programme and culminates in the reporting and dissemination of a study’s findings. Each chapter in the book is authored by one or more international authorities with experience in leading the implementation of an element of the described methodology.
As the contributors to this book explain, the science of large‐scale assessments is continuing to evolve. The challenges faced by the field and addressed by a number of contributors to this book include the collection of useful, internationally comparable data on a broader range of skills and attributes than have typically been assessed in large‐scale surveys. National education systems and governments are increasingly identifying skills and attributes such as collaboration, innovativeness, entrepreneurship and creativity as important outcomes of school education. The assessment of such attributes may require very different methods of observation and data gathering, including by capitalising on advances in assessment technologies.
An ongoing challenge will be to ensure that the results of large‐scale assessments continue to meet their essential purpose: to inform and lead effective educational policies and practices to better prepare all students for life and work in the twenty‐first century.
Professor Geoff Masters (AO)CEO, Australian Council for Educational Research (ACER)Camberwell, Victoria, January 2016
The editors gratefully acknowledge the Australian Council for Educational Research (ACER), the Australian Department of Foreign Affairs and Trade (DFAT) and Westat for their support of this book.
Particular thanks go to Juliet Young‐Thornton for her patient, friendly and effective assistance throughout the process of producing this book.
ACER
Australian Council
for
Educational Research
ALL
Adult Literacy and Life Skills Survey
ASER
Annual Status of Education Report
BRR
Balanced repeated replication
CBA
Computer‐based assessment
CFA
Confirmatory factor analysis
CFI
Comparative fit index
CIVED
Civic Education Study
CONFEMEN
Conference of Education Ministers of Countries using French as the Language of Communication/Conférence des ministres de l'Éducation des Etats et gouvernements de la Francophonie
DIF
Differential item functioning
DPS
Described proficiency scale
EFA
Exploratory factor analysis
ESC
Expected scored curves
ESCS
Economic, social and cultural status
ESS
European Social Survey
ETS
Educational Testing Service
FEGS
Functional Expert Groups
FIMS
First International Mathematics Study
FT
Field trial
ICC
Item characteristic curve
ICCS
International Civic and Citizenship Education Study
ICILS
International Computer and Information Literacy Study
ICT
Information and computer technology
IDB
International database
IDs
Identification variables
IEA
International Association for the Evaluation of Educational Achievement
IIEP
UNESCO International Institute for Educational Planning
ILO
International Labour Organization
IREDU
Institute for Research in the Sociology and Economics of Education
IRM
Item response models
IRT
Item response theory
ISCED
International Standard Classification of Education
ISCO
International Standard Classification of Occupations
ISEI
International Socio
‐
Economic Index of Occupational Status
ITC
International Test Commission
LAMP
Literacy Assessment and Monitoring Programme
LAN
Local area network
LGE
General Education Law/General de Educación
LLECE
Latin American Laboratory for Assessment of the Quality of Education/Laboratorio Latinoamericano de Evaluación de la Calidad de la Educación
LSA
Large‐scale assessments
MOS
Measure of size
MS
Main survey
MTEG
Monitoring Trends in Educational Growth
NAEP
United States National Assessment of Educational Progress
NNFI
Non‐normed fit index
NPMs
National project managers
OCR
Optical character recognition
OECD
Organisation for Economic Co‐operation and Development
PASEC
The Programme for the Analysis of Education Systems of CONFEMEN/Programme d’Analyse des Systèmes Éducatifs de la CONFEMEN
PCA
Principal component analysis
PCM
Partial credit model
PIRLS
Progress in International Reading Literacy Study
PISA
Programme for International Student Assessment
PL
Parameter logistic model
PPS
Probability proportional to size
PSUs
Primary sampling units
RL
Reading Literacy Study
RMSEA
Root‐mean square error of approximation
RP
Response probability
SACMEQ
Southern and Eastern Africa Consortium for Monitoring Educational Quality
SDGs
Sustainable Development Goals
SEA‐PLM
Southeast Asian Primary Learning Metrics
SEM
Structural equation modelling
SERCE
Second Regional Comparative and Explanatory Study
SES
Socio‐economic status
SIGE
Students General Information System/Sistema Información General de Estudiantes
SIMCE
Sistema de Medición de la Calidad de la Educación
SIMS
Second International Mathematics Study
SISS
Second International Science Study
SITES
Second Information Technology in Education Study
SSUs
Secondary sampling units
TALIS
Teaching and Learning International Survey
TCMAs
Test‐Curriculum Matching Analyses
TERCE
Third Regional Comparative and Explanatory Study
TIMSS
Trends in International Mathematics and Science Study
TORCH
Test of Reading Comprehension
TRAPD
Translation, Review, Adjudication, Pretesting, and Documentation
UAENAP
United Arab Emirates (UAE) National Assessment Program
UNESCO
United Nations Educational, Scientific and Cultural Organization
UREALC
UNESCO’s Regional Bureau of Education for Latin America and the Caribbean
Petra Lietz, John C. Cresswell, Keith F. Rust and Raymond J. Adams
The 60 years that followed a study of mathematics in 12 countries conducted by the International Association for the Evaluation of Educational Achievement (IEA) in 1964 have seen a proliferation of large‐scale assessments (LSAs) in education. In a recent systematic review of the impact of LSAs on education policy (Best et al., 2013), it was estimated that LSAs in education are now being undertaken in about 70% of the countries in the world.
The Programme for International Student Assessment (PISA) conducted by the Organisation for Economic Co‐operation and Development (OECD) was implemented in 75 countries in 2015 with around 510 000 participating students and their schools. Similarly, the Trends in International Mathematics and Science Study (TIMSS), conducted by the IEA, collected information from schools and students in 59 countries in 2015.
This book is about the implementation of LSAs in schools which can be considered to involve 13 key areas. These start with the explication of policy goals and issues, assessment frameworks, test and questionnaire designs, item development, translation and linguistic control as well as sampling. They also cover field operations, technical standards, data collection, coding and management as well as quality assurance measures. Finally, test and questionnaire data have to be scaled and analysed while a database is produced and accompanied by dissemination and the reporting of results. While much of the book has been written from a central coordinating and management perspective, two chapters illustrate the actual implementation of LSAs which highlight the requirements regarding project teams and infrastructure required for participation in such assessments. Figure 1.2 in the concluding section of this chapter provides details regarding where each of these 13 key areas is covered in the chapters of this book.
Participation in these studies, on a continuing basis, is now widespread, as is indicated in . Furthermore, their results have become integral to the general public discussion of educational progress and international comparisons in a wide range of countries with the impact of LSAs on education policy being demonstrated (e.g. Baker & LeTendre, 2005; Best et al., 2013; Breakspear, 2012; Gilmore, 2005). Therefore, it seems timely to bring together in one place the collective knowledge of those who routinely conduct these studies, with the aim of informing users of the results as to how such studies are conducted and providing a handbook for future practitioners of current and prospective studies.
While the emphasis throughout the book is on the practical implementation of LSAs, it is grounded in theories of psychometrics, statistics, quality improvement and survey communication. The chapters of this book seek to cover in one place almost every aspect of the design, implementation and analysis of LSAs, (see Figure 1.2), with perhaps greater emphasis on the aspects of implementation than can be found elsewhere. This emphasis is intended to complement other recent texts with related content but which have a greater focus on the analysis of data from LSAs (e.g. Rutkowski, von Davier & Rutkowski, 2013).
This introductory chapter first provides some context in terms of the development of international, regional and national assessments and the policy context in which they occur. Then, the purposes for countries to undertake such assessments, particularly with a view to evidence‐based policymaking in education, are discussed. This is followed by a description of the content of the book. The chapter finishes with considerations as to where LSAs might be headed and what is likely to shape their development.
The IEA first started a programme of large‐scale evaluation studies in education with a pilot study to explore the feasibility of such an endeavour in 1959–1961 (Foshay et al., 1962). After the feasibility study had shown that international comparative studies in education were indeed possible, the first content areas to be tested were mathematics with the First International Mathematics Study conducted by 12 countries in 1962–1967 (Husén, 1967; Postlethwaite, 1967) and the content areas of the six subject surveys, namely, civic education, English as a foreign language, French as a foreign language, literature education, reading, comprehension and science, conducted in 18 countries in 1970–1971. Since then, as can be seen in Appendix 1.A, participation in international studies of education has grown considerably with 59 and 75 countries and economies, respectively, participating in the latest administrations of the TIMSS by the IEA in 2015 and the PISA by the OECD in 2015.
In addition to international studies conducted by the IEA since the late 1950s and by the OECD since 2000, commencing in the mid 1990s, three assessment programmes with a regional focus have been designed and implemented. First, the Conference of Education Ministers of Countries Using French as the Language of Communication (Conférence des ministres de l’Education des États et gouvernements de la Francophonie – CONFEMEN) conducts the Programme d’Analyse des Systèmes Educatifs de la CONFEMEN (PASEC). Since its first data collection in 1991, assessments have been undertaken in over 20 francophone countries not only in Africa but other parts of the world (e.g. Cambodia, Laos and Vietnam). Second, the Southern and Eastern African Consortium for Monitoring Educational Quality (SACMEQ), with the support of the UNESCO International Institute for Educational Planning (IIEP) in Paris, has undertaken four data collections since 1995, with the latest assessment in 2012–2014 (SACMEQ IV) involving 15 countries in Southeast Africa. Third, the Latin‐American Laboratory for Assessment of the Quality in Education (LLECE is the Spanish acronym), with the assistance of UNESCO’s Regional Bureau for Education in Latin America and the Caribbean (UREALC), has undertaken three rounds of data collection since 1997, with 15 countries participating in the Third Regional Comparative and Explanatory Study (TERCE) in 2013. First steps towards an assessment in the Asia‐Pacific region are currently being undertaken through the Southeast Asian Primary Learning Metrics (SEA‐PLM) initiative.
In terms of LSAs of student learning, a distinction is made here between LSAs that are intended to be representative of an entire education system, which may measure and monitor learning outcomes for various subgroups (e.g. by gender or socio‐economic background), and large‐scale examinations that are usually national in scope and which report or certify individual student’s achievement (Kellaghan, Greaney & Murray, 2009). Certifying examinations may be used by education systems to attest achievement at the end of primary or secondary education, for example, or education systems may use examinations to select students and allocate placements for further or specialised study, such as university entrance or scholarship examinations. The focus of this book is on the implementation of LSAs of student learning that are representative of education systems, particularly international assessments that compare education systems and student learning across participating countries.
Parallel to the growth in international assessments, the number of countries around the world administering national assessments in any year has also increased – from 28 in 1995 to 57 in 2006 (Benavot & Tanner, 2007). For economically developing countries in the period from 1959 to 2009, Kamens and Benavot (2011) reported the highest number of national assessments in one year as 37 in 1999. Also in the 1990s, most of the countries in Central and South America introduced national assessments (e.g. Argentina, Bolivia, Brazil, Colombia, Dominican Republic, Ecuador, El Salvador, Guatemala, Paraguay, Peru, Uruguay and Venezuela) through the Partnership for Educational Revitalization in the Americas (PREAL) (Ferrer, 2006) although some introduced them earlier (e.g. Chile in 1982 and Costa Rica in 1986).
International, regional and national assessment programmes can all be considered as LSAs in education. While this book focuses mainly on international assessment programmes conducted in primary and secondary education, it also contains examples and illustrations from regional and national assessments where appropriate.
Data from LSAs provide information regarding the extent to which students of a particular age or grade in an education system are learning what is expected in terms of certain content and skills. In addition, they assess differences in achievement levels by subgroups such as gender or region and factors that are correlated with different levels of achievement. Thus, a general purpose of participation in LSAs is to obtain information on a system’s educational outcomes and – if questionnaires are administered to obtain background information from students, teachers, parents and/or schools – the associated factors, which, in turn, can assist policymakers and other stakeholders in the education system in making policy and resourcing decisions for improvement (Anderson, Chiu & Yore, 2010; Benavot & Tanner, 2007; Braun, Kanjee & Bettinger, 2006; Grek, 2009; Postlethwaite & Kellaghan, 2008). This approach to education policymaking, based on evidence, including data from LSAs, has been adopted around the world, with Wiseman (2010, p. 2) stating that it is ‘the most frequently reported method used by politicians and policymakers which he argues can be considered a global norm for educational governance’.
More specifically, Wiseman (2010) has put forward three main purposes for evidence‐based policymaking, namely, measuring and ensuring quality, ensuring equity and accountability. To fulfil the purpose of measuring quality, comparisons of performance across countries and over time tend to be undertaken. To provide indicators of equity, the performance of subgroups in terms of gender, socio‐economic status, school type or regions tends to be compared. Accountability refers to the use of assessment results to monitor and report, sometimes publicly, achievement results to enforce schools and other stakeholders to improve practice for meeting defined curricular and performance standards. In addition, the use of assessments for accountability purposes may use assessment data to implement resource allocation policies (e.g. staff remuneration and contracts). Accountability is more frequently an associated goal of national assessment programmes than international assessment programmes.
To explicate further the way in which information from LSAs is used in education policymaking, models of the policy cycle are frequently put forward (e.g. Bridgman & Davis, 2004; Haddad & Demsky, 1995; Sutcliffe & Court, 2005). While most models include between six and eight stages, they seem to share four stages, namely, agenda setting, policy formulation, policy implementation and monitoring and evaluation. Agenda setting is the awareness of and priority given to an issue or problem whereas policy formulation refers to the analytical and political ways in which options and strategies are constructed. Policy implementation covers the forms and nature of policy administration and activities in the field. In the final step, monitoring and evaluation involves an appraisal of the extent to which implemented policies have achieved the intended aims and objectives. A model showing these four steps is shown in Figure 1.1.
Figure 1.1 Simplified model of the policy cycle
(Source: Sutcliffe and Court (2005). Reproduced with permission from the Overseas Development Institute)
Regardless of their purpose, data from LSAs are reported mainly through international, regional and national reports. However, these data are also used quite extensively in secondary data analyses (e.g. Hansen, Gustafsson & Rosén, 2014; Howie & Plomp, 2006; Owens, 2013), as well as meta‐analyses (e.g. Else‐Quest, Hyde & Linn, 2010; Lietz, 2006) which frequently lead to policy recommendations.
While recommendations are widespread, examples of the actual impact of these assessments on education policy are often provided in a more anecdotal or case study fashion (see Figazollo, 2009; Hanushek & Woessmann, 2010; McKinsey & Company, 2010) or by the main initiators of these assessments (e.g. Husén, 1967). Moreover, surveys have been conducted to ascertain the policy impact of these assessments. As these surveys have frequently been commissioned or initiated by the organisation responsible for the assessment (e.g. Breakspear, 2012 for the OECD; Gilmore, 2005 for the IEA), a certain positive predisposition regarding the effectiveness of the link between assessment and policy could be assumed. Similarly, surveys of and interviews with staff in ministries and entities that participate in such assessments (e.g. UNESCO, 2013), and that rely on future funding to continue their participation, are likely to report positively on the effects of assessment results on education policymaking.
Two systematic reviews that were conducted recently (Best et al., 2013; Tobin et al., 2015) took a different approach by systematically locating and analysing available evidence of links between LSA programmes and education policy. In other words, these reviews did not include reports or articles that resulted in policy recommendations or surveys of participating entities’ perceived impact of assessments on policy but looked for actual evidence of an assessment–policy link. In the review that focused on such a link in economically developing countries between 1990 and 2011 (Best et al., 2013), of 1325 uniquely identified materials only 54 were considered to provide such evidence. In the review that focused on all countries in the Asia‐Pacific between 1990 and 2013 (Tobin et al., 2015), 68 of the 1301 uniquely identified materials showed evidence of such a link.
Results of these systematic reviews revealed some interesting insights into the use of LSAs as follows:
Just under half of the assessment programmes in the review were national in coverage, followed by one‐third international programmes, while approximately one‐fifth were regional assessment programmes and only a few were subnational assessment programmes.
Of the regional assessment programmes SACMEQ featured most often, followed by LLECE/SERCE and PASEC.
Of the international assessments, PISA featured most often, followed by TIMSS and the Progress in International Reading Literacy Study (PIRLS).
LSA programmes were most often intended to measure and ensure educational quality. Assessment programmes were less often used for the policy goals of equity or accountability for specific education matters.
The most frequent education policies impacted upon by the use of assessment data were system‐level policies regarding (i) curriculum standards and reform, (ii) performance standards and (iii) assessment policies.
The most common facilitators for assessment data to be used in policymaking, regardless of the type of assessment programme, were media and public opinion as well as appropriate and ongoing dissemination to stakeholders.
Materials which explicitly noted no impact on the policy process outlined barriers to the use of assessment data, which were thematically grouped as problems relating to (i) the (low) quality of an assessment programme and analyses, (ii) financial constraints, (iii) weak assessment bodies and fragmented government agencies and (iv) low technical capacity of assessment staff.
The high quality of the assessment programme was frequently seen as a facilitator to the use of regional assessment data, while the lack of quality was often regarded as a barrier to the use of subnational and national assessments. In international assessments, quality emerged as both a facilitator and barrier. The high quality of an assessment programme was seen as a facilitator in so far as the results were credible, robust and not questioned by stakeholders. They were also regarded as a barrier in that the requirement of having to adhere to the internationally defined high‐quality standards was frequently a challenge to participating countries.
As the chapters throughout this book demonstrate, for assessment programmes to be of high quality, much effort, expertise, time and financial resources are required. While developing and maintaining the necessary funding and expertise continues to be a challenge, ultimately, the highest quality standards are required if information from LSAs is to be taken seriously by policymakers and other users of these data. Such high technical quality, combined with the ongoing integration of assessments into policy processes and an ongoing and varied media and communication strategy will increase the usefulness of evidence from LSAs for various stakeholders (Tobin et al., 2015).
One‐off or cross‐sectional assessments can provide information about an outcome of interest at one point in time. This is of some interest in the comparative sense as participating systems can look at each other’s performance on the outcome and see what they can learn from those systems that (i) perform at a higher level or (ii) manage to produce greater homogeneity between the highest and lowest achievers or (iii) do preferably both i and ii. These comparisons, however, are made across cultures and it is frequently being questioned as to which cultures or countries it is appropriate or reasonable to compare (e.g. Goldstein & Thomas, 2008). The relatively higher achievement of many Asian countries in PISA and TIMSS compared to other countries is often argued to be a consequence of differences in basic tenets and resulting dispositions, beliefs and behaviours across countries. Thus, various authors (e.g. Bracey, 2005; Leung, 2006, Minkov, 2011; Stankov, 2010) demonstrate cultural differences across societies regarding, for example, the value of education or student’s effort or respect for teachers which makes it difficult to see how countries can learn from each other to improve outcomes. Therefore, assessments that enable comparisons over time within countries are often considered to be more meaningful.
In England, the follow‐up study of the Plowden National Survey of 1964 was undertaken 4 years later in 1968 and was reported by Peaker (1967, 1971). This study followed the same students over the two occasions. Similarly, in Australia, a longitudinal Study of School Performance was carried out in 1975 with a subsample of students following up 4 years later in 1979 with 10‐and 14‐year‐old students in the fields of literacy and numeracy (Bourke et al. 1981; Keeves and Bourke, 1976; Williams et al., 1980).
Both of these studies were longitudinal in kind, which is relatively rare in the realm of LSAs, which tend to use repeated cross‐sectional assessments as a means to gauge changes over time across comparable cohorts, rather than looking at growth within a cohort by following the same individuals over time. The most substantial and continuing programme of this latter type of assessment of national scope is the National Assessment of Educational Progress (NAEP) in the United States. It was initiated in 1969 in order to assess achievement at the levels of Grade 4, Grade 8 and Grade 12 in reading, mathematics and science (see, e.g. Jones & Olkin, 2004; Tyler, 1985).
The main international assessments are cross‐sectional in kind and are repeated at regular intervals with PIRLS conducted every 5 years, PISA every 3 years and TIMSS every 4 years. As the target population (e.g. 15‐year‐olds or Grade 4 students) remains the same on each occasion, it enables the monitoring of student outcomes for this target population over time. Notably, the importance of providing trend information was reflected by IEA’s change in what ‘TIMSS’ meant. In the 1995 assessment, the ‘T’ stood for ‘third’ which was still maintained in 1999 where the study was called the ‘Third International Mathematics and Science Study Repeat’. By the time of the 2003 assessment, however, the ‘T’ stood for ‘Trends in International Mathematics and Science Study’.
Now that PISA has assessed all major domains (i.e. reading, mathematics and science) twice, increasingly the attention paid to the results within each country are those of national trends, both overall and for population subgroups, rather than cross national. It is not news anymore that Korean students substantially outperform US students in mathematics. Indeed, if an implementation of PISA were suddenly to show this not to be the case the results would not be believed, even though a different cohort is assessed each time. Generally, participating countries are most interested in whether or not there is evidence of improvement over time, both since the prior assessment and over the longer term. Such comparisons within a country over time are of great interest since they are not affected by the possible unique effects of culture which can be seen as problematic for cross‐country comparisons.
Increasingly, countries that participate in PISA supplement their samples with additional students, not in a way that will appreciably improve the precision of comparisons with other countries but in ways that will improve the precision of trend measurements for key demographic groups within the country, such as ethnic or language minorities or students of lower socio‐economic status. Of course this does not preclude the occasional instances of political leaders who vow to show improvements in education over time through a rise in the rankings of the PISA or TIMSS ‘league tables’ (e.g. Ferrari, 2012).
As emphasised at the beginning of this introduction and found in the systematic reviews, for LSAs to be robust and useful, they need to be of high quality, technically sound, have a comprehensive communication strategy and be useful for education policy. To achieve this aim, 13 key areas need to be considered in the implementation of LSAs (see Figure 1.2).
Figure 1.2 Key areas of a robust assessment programme
While Figure 1.2 illustrates where these key areas are discussed in the chapters of this book, a brief summary of the content of each chapter is given below.
Chapter 2
– Test Design and Objectives
Given that all LSAs have to address the 13 elements of a robust assessment programme, why and how do these assessments differ from one another in practice? The answer to this question lies in the way that the purpose and guiding principles of an assessment guide decisions about who and what should be assessed. In this chapter, Dara Ramalingam outlines the key features of a selection of LSAs to illustrate the way in which their different purposes and assessment frameworks have led to key differences in decisions about test content, target population and sampling.
Chapter 3
– Test Development
All educational assessments that seek to provide accurate information about the test takers’ knowledge, skills and understanding in the domain of interest share a number of common characteristics. These include tasks which elicit responses that contribute to building a sense of the test takers’ capacity in the domain. This also means that the tests draw on knowledge and understanding that are intrinsic to the domain and are not likely to be more or less difficult for any individual or group because of knowledge or skills that are irrelevant to the domain. The tests must be in a format that is suited to the kind of questions being asked, provide coverage of the area of learning that is under investigation and they must be practically manageable. Juliette Mendelovits describes the additional challenges for LSAs to comply with these general ‘best practice’ characteristics as international LSAs start with the development of frameworks that guide the development of tests that are subsequently administered to many thousands of students in diverse countries, cultures and contexts.
Chapter 4
– Design, Development and Implementation of Contextual Questionnaires in LSAs
In order to be relevant to education policy and practice, LSAs routinely collect contextual information through questionnaires to enable the examination of factors that are linked to differences in student performance. In addition, information obtained by contextual questionnaires is used independently of performance data to generate indicators of non‐cognitive learning outcomes such as students’ attitudes towards reading, mathematics self‐efficacy and interest in science or indicators about teacher education and satisfaction as well as application of instructional strategies. In this chapter, Petra Lietz not only gives an overview of the content of questionnaires for students, parents, teachers and schools in LSAs but also discusses and illustrates the questionnaire design process from questionnaire framework development to issues such as question order and length as well as question and response formats.
Chapter 5
– Sample Design, Weighting and Calculation of Sampling Variance
Since the goal of LSAs as we have characterised them is to measure the achievement of populations and specified subgroups rather than that of individual students and schools, it is neither necessary, nor in most cases feasible, to assess all students in the target population within each participating country. Hence, the selection of an appropriate sample of students, generally from a sample of schools, is a key technical requirement for these studies. In this chapter, Keith Rust, Sheila Krawchuk and Christian Monseur describe the steps involved in selecting such samples and their rationale. Given that a complex, stratified multistage sample is selected in most instances, those analysing the data must use appropriate methods of inference that take into account the effects of the sample design on the sample configuration. Furthermore, some degree of school and student nonresponse is bound to occur, and methods are needed in an effort to mitigate any bias that such nonresponse might introduce.
Chapter 6
– Translation and Cultural Appropriateness of Survey Material in LSAs
Cross‐linguistic, cross‐national and cross‐cultural equivalence is a fundamental requirement of LSAs in education which seek to make comparisons across many different settings. While procedures for the translation, adaptation, verification and finalisation of survey materials – also called ‘localisation’ – will not completely prevent language or culturally induced bias, they aim to minimise the possibility of them occurring. In this chapter, Steve Dept, Andrea Ferrari and Béatrice Halleux discuss the strengths and weaknesses of various approaches to the localisation of materials in different LSAs and single out practices that are more likely than others to yield satisfactory outcomes.
Chapter 7
– Quality Assurance
Quality assurance measures cover all aspects from test development to database production as John Cresswell explains in this chapter. To ensure comparability of the results across students and across countries and schools, much work has gone into standardising cross‐national assessments. The term ‘standardised’, in this context, not only refers to the scaling and scoring of the tests but also to the consistency in the design, content and administration of the tests (deLandshere, 1997). This extent of standardisation is illustrated by the PISA technical standards which for the administration in 2012 (NPM(1003)9a) covered three broad standards, one concerning data, the second regarding management and the third regarding national involvement. Data standards covered target population and sampling, language of testing, field trial participation, adaptation and translation of tests, implementation of national options, quality monitoring, printing, response coding and data submission. Management standards covered communication, notification of international and national options, schedule for material submission, drawing of samples, data management and archiving of materials. National standards covered feedback regarding appropriate mechanisms for promoting school participation and dissemination of results among all national stakeholders.
Chapter 8
– Processing Responses to Open‐ended Survey Questions
In this chapter, Ross Turner discusses the challenges associated with the consistent assessment of responses that students generate when answering questions other than multiple‐choice items. The methods described take into account the increased difficulty of this task when carried out in an international setting. Examples are given of the detailed sets of guidelines which are needed to code the responses and the processes involved in developing and implementing these guidelines.
Chapter 9
– Computer‐based Delivery of Cognitive Assessment and Questionnaires
As digital technologies have advanced in the twenty‐first century, the demand for using these technologies in large‐scale educational assessment has increased. Maurice Walker focuses in this chapter on the substantive and logistical rationales for adopting or incorporating a computer‐based approach to student assessment. He outlines assessment architecture and important item design options with the view that well‐planned computer‐based assessment (CBA) should be a coherent, accessible, stimulating and intuitive experience for the test taker. Throughout the chapter, examples illustrate the differing degrees of diffusion of digital infrastructure into the schools of countries that participate in LSAs. It also discusses the impact of these infrastructure issues on the choices of whether and how to undertake CBAs.
Chapter 10
– Data Management Procedures
Falk Brese and Mark Cockle discuss in this chapter the data management procedures needed to minimise error that might be introduced by any processes involved with converting responses from students, teachers, parents and school principals to electronic data. The chapter presents the various aspects of data management of international LSAs that need to be taken into account to meet this goal.
Chapter 11
– Test Implementation in the Field: The Case of PASEC
Oswald Koussihouèdé describes the implementation of one of the regional assessments – PASEC – which is undertaken in francophone countries in Africa and Asia. He describes the significant changes which have recently been made to this assessment programme in an attempt to better describe the strengths and weaknesses of the student populations of the participating countries and to ensure that the assessment is being implemented using the latest methodology.
Chapter 12
– Test Implementation in the Field: The Experience of Chile in International LSAs
Chile has participated in international LSAs undertaken by the IEA, OECD and UNESCO since 1998. Ema Lagos first explains the context in which these assessments have occurred, both in terms of the education system as well as political circumstances. She then provides a comprehensive picture of all the tasks that need to be undertaken by a participating country, from input into instrument and item development, sampling, the preparation of test materials and manuals and the conduct of field operations to the coding, entry, management and analysis of data and the reporting of results.
Chapter 13
– Why LSAs Use Scaling and Item Response Theory (IRT)
