98,99 €
Praise for the First Edition "This pioneering work, in which Rao provides a comprehensive and up-to-date treatment of small area estimation, will become a classic...I believe that it has the potential to turn small area estimation...into a larger area of importance to both researchers and practitioners." --Journal of the American Statistical Association Written by two experts in the field, Small Area Estimation, Second Edition provides a comprehensive and up-to-date account of the methods and theory of small area estimation (SAE), particularly indirect estimation based on explicit small area linking models. The model-based approach to small area estimation offers several advantages including increased precision, the derivation of "optimal" estimates and associated measures of variability under an assumed model, and the validation of models from the sample data. Emphasizing real data throughout, the Second Edition maintains a self-contained account of crucial theoretical and methodological developments in the field of SAE. The new edition provides extensive accounts of new and updated research, which often involves complex theory to handle model misspecifications and other complexities. Including information on survey design issues and traditional methods employing indirect estimates based on implicit linking models, Small Area Estimation, Second Edition also features: * Additional sections describing the use of R code data sets for readers to use when replicating applications * Numerous examples of SAE applications throughout each chapter, including recent applications in U.S. Federal programs * New topical coverage on extended design issues, synthetic estimation, further refinements and solutions to the Fay-Herriot area level model, basic unit level models, and spatial and time series models * A discussion of the advantages and limitations of various SAE methods for model selection from data as well as comparisons of estimates derived from models to reliable values obtained from external sources, such as previous census or administrative data Small Area Estimation, Second Edition is an excellent reference for practicing statisticians and survey methodologists as well as practitioners interested in learning SAE methods. The Second Edition is also an ideal textbook for graduate-level courses in SAE and reliable small area statistics.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 821
Veröffentlichungsjahr: 2015
Cover
Series Page
Title Page
Copyright
Dedication
List of Figures
List of Tables
Foreword to the First Edition
Preface to the Second Edition
Preface to the First Edition
Chapter 1: Introduction
1.1 What Is a Small Area?
1.2 Demand for Small Area Statistics
1.3 Traditional Indirect Estimators
1.4 Small Area Models
1.5 Model-Based Estimation
1.6 Some Examples
Chapter 2: Direct Domain Estimation
2.1 Introduction
2.2 Design-Based Approach
2.3 Estimation of Totals
2.4 Domain Estimation
2.5 Modified GREG Estimator
2.6 Design Issues
2.7 Optimal Sample Allocation for Planned Domains
2.8 Proofs
Chapter 3: Indirect Domain Estimation
3.1 Introduction
3.2 Synthetic Estimation
3.3 Composite Estimation
3.4 James–Stein Method
3.5 Proofs
Chapter 4: Small Area Models
4.1 Introduction
4.2 Basic Area Level Model
4.3 Basic Unit Level Model
4.4 Extensions: Area Level Models
4.5 Extensions: Unit Level Models
4.6 Generalized Linear Mixed Models
Chapter 5: Empirical Best Linear Unbiased Prediction (EBLUP): Theory
5.1 Introduction
5.2 General Linear Mixed Model
5.3 Block Diagonal Covariance Structure
5.4 Model Identification and Checking
5.5 Software
5.6 Proofs
Chapter 6: Empirical Best Linear Unbiased Prediction (EBLUP): Basic Area Level Model
6.1 EBLUP Estimation
6.2 MSE Estimation
6.3 *Robust estimation in the presence of outliers
6.4 *Practical issues
6.5 *Software
Chapter 7: Basic Unit Level Model
7.1 EBLUP estimation
7.2 MSE Estimation
7.3 Applications
7.4 Outlier Robust EBLUP Estimation
7.5 M-Quantile Regression
7.6 Practical Issues
7.7 Software
7.8 Proofs
Chapter 8: EBLUP: Extensions
8.1 Multivariate Fay–Herriot Model
8.2 Correlated Sampling Errors
8.3 Time Series and Cross-Sectional Models
8.4 Spatial Models
8.5 Two-fold Subarea Level Models
8.6 Multivariate Nested Error Regression Model
8.7 Two-fold Nested Error Regression Model
8.8 Two-Level Model
8.9 Models for Multinomial Counts
8.10 EBLUP for Vectors of Area Proportions
8.11 Software
Chapter 9: Empirical Bayes (EB) Method
9.1 Introduction
9.2 Basic Area Level Model
9.3 Linear Mixed Models
9.4 EB Estimation of General Finite Population Parameters
9.5 Binary Data
9.6 Disease Mapping
9.7 Design-Weighted EB Estimation: Exponential Family Models
9.8 Triple-goal Estimation
9.9 Empirical Linear Bayes
9.10 Constrained LB
9.11 Software
9.12 Proofs
Chapter 10: Hierarchical Bayes (HB) Method
10.1 Introduction
10.2 MCMC Methods
10.3 Basic Area Level Model
10.4 Unmatched Sampling and Linking Area Level Models
10.5 Basic Unit Level Model
10.6 General ANOVA Model
10.7 HB Estimation of General Finite Population Parameters
10.8 Two-Level Models
10.9 Time Series and Cross-sectional Models
10.10 Multivariate Models
10.11 Disease Mapping Models
10.12 Two-Part Nested Error Model
10.13 Binary Data
10.14 Missing Binary Data
10.15 Natural Exponential Family Models
10.16 Constrained HB
10.17 Approximate HB Inference and Data Cloning
10.18 Proofs
References
Author Index
Subject Index
Wiley Series In Survey Methodology
End User License Agreement
xv
xvi
xvii
xviii
xix
xx
xxi
xxiii
xxiv
xxv
xxvi
xxvii
xxviii
xxix
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
161
162
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
163
164
165
166
167
168
169
170
171
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
227
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
228
229
230
231
232
233
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
Cover
Table of Contents
Foreword To The First Edition
Preface To The Second Edition
Begin Reading
Chapter 3: Indirect Domain Estimation
Figure 3.1 Direct, Census, composite SPREE, and GLSM estimates of Row Profiles for Canadian provinces Newfoundland and Labrador (a) and Quebec (b), for Two-Digit Occupation class A1.
Figure 3.2 Direct, Census, composite SPREE, and GLSM estimates of row profiles for Canadian provinces Newfoundland and Labrador (a) and Nova Scotia (b), for two-digit occupation class B5.
Chapter 6: Empirical Best Linear Unbiased Prediction (EBLUP): Basic Area Level Model
Figure 6.1 EBLUP and Direct Area Estimates of Average Expenditure on Fresh Milk for Each Small Area (a). CVs of EBLUP and Direct Estimators for Each Small Area (b). Areas are Sorted by Decreasing Sample Size.
Chapter 7: Basic Unit Level Model
Figure 7.1 Leverage measures versus scaled squared residuals.
Chapter 8: EBLUP: Extensions
Figure 8.1 Naive Nonparametric Bootstrap MSE Estimates Against Analytical MSE Estimates (a). Bias-corrected Nonparametric Bootstrap MSE Estimates Against Analytical MSE Estimates (b).
Figure 8.2 EBLUP Estimates, Based on the Spatial FH Model with SAR Random Effects, and Direct Estimates of Mean Surface Area Used for Production of Grapes for Each Municipality (a). CVs of EBLUP Estimates and of Direct Estimates for Each Municipality (b). Municipalities are Sorted by Increasing CVs of Direct Estimates.
Chapter 9: Empirical Bayes (EB) Method
Figure 9.1 Bias (a) and MSE (b) over Simulated Populations of EB, Direct, and ELL Estimates of Percent Poverty Gap for Each Area
i
.
Source
: Adapted from Molina and Rao (2010).
Figure 9.2 True MSEs of EB Estimators of Percent Poverty Gap and Average of Bootstrap MSE Estimators Obtained with for Each Area
i
.
Source
: Adapted from Molina and Rao (2010).
Figure 9.3 Bias (a) and MSE (b) of EB, Direct and ELL Estimators of the Percent Poverty Gap for Each Area
i
under Design-Based Simulations.
Source
: Adapted from Molina and Rao (2010).
Figure 9.4 Index Plot of Residuals (a) and Histogram of Residuals (b) from the Fitting of the Basic Unit Level Model with Response Variable log(income+constant).
Chapter 10: Hierarchical Bayes (HB) Method
Figure 10.1 Coefficient of Variation (CV) of Direct and HB Estimates.
Source
: Adapted from Figure 3 in You, Rao, and Gambino (2003).
Figure 10.2 CPO Comparison Plot for Models 1–3.
Source
: Adapted from Figure 1 in You and Rao (2000).
Figure 10.3 Direct, Cross-sectional HB (HB2) and Cross-Sectional and Time Series HB (HB1) Estimates.
Source
: Adapted from Figure 2 in You, Rao, and Gambino (2003).
Figure 10.4 Coefficient of Variation of Cross-sectional HB (HB2) and Cross-Sectional and Time Series HB (HB1) Estimates.
Source
: Adapted from Figure 3 in You, Rao, and Gambino (2003).
Chapter 3: Indirect Domain Estimation
Table 3.1 True State Proportions, Direct and Synthetic Estimates, and Associated Estimates of RRMSE
Table 3.2 Medians of Percent ARE of SPREE Estimates
Table 3.3 Percent Average Absolute Relative Bias (%)and Percent Average RRMSE (%) of Estimators
Table 3.4 Batting Averages for 18 Baseball Players
Chapter 6: Empirical Best Linear Unbiased Prediction (EBLUP): Basic Area Level Model
Table 6.1 Values of for States with More Than 500 Small Places
Table 6.2 Values of Percentage Absolute Relative Error of Estimates from True Values: Places with Population Less Than 500
Table 6.3 Average MSE of EBLUP Estimators Based on REML, LL, LLM, YL, and YLM Methods of Estimating
Table 6.4 % Relative Bias (RB) of Estimators of
Chapter 7: Basic Unit Level Model
Table 7.1 EBLUP Estimates of County Means and Estimated Standard Errors of EBLUP and Survey Regression Estimates
Table 7.2 Unconditional Comparisons of Estimators: Real and Synthetic Population
Table 7.3 Effect of Between-Area Homogeneity on the Performance of SSD and EBLUP
Table 7.4 EBLUP and Pseudo-EBLUP Estimates and Associated Standard Errors (s.e.): County Corn Crop Areas
Table 7.5 Average Absolute Bias (), Average Root Mean Squared Error () of Estimators, and Percent Average Absolute Relative Bias () of MSE Estimators
Chapter 8: EBLUP: Extensions
Table 8.1 Distribution of Coefficient of Variation (%)
Table 8.2 Average Absolute Relative Bias () and Average Relative Root MSE () of SYN, SSD, FH, and EBLUP (State-Space)
Chapter 9: Empirical Bayes (EB) Method
Table 9.1 Percent Average Relative Bias (
) of MSE Estimators
Chapter 10: Hierarchical Bayes (HB) Method
Table 10.1 MSE Estimates and Posterior Variance for Four States
Table 10.2 1991 Canadian Census Undercount Estimates and Associated CVs
Table 10.3 EBLUP and HB Estimates and Associated Standard Errors: County Corn Areas
Table 10.4 Pseudo-HB and Pseudo-EBLUP Estimates and Associated Standard Errors: County Corn Areas
Table 10.5 Estimated % CVs of the Direct, EB, and HB Estimators of Poverty Incidence for Selected Provinces by Gender
Table 10.6 Average Absolute Relative Error (ARE%): Median Income of Four-Person Families
Table 10.7 Comparison of Models 1–3: Mortality Rates
Second Edition
J.N.K. Rao And Isabel Molina Wiley Series in Survey Methodology
Copyright © 2015 by John Wiley & Sons, Inc. All rights reserved
Published by John Wiley & Sons, Inc., Hoboken, New Jersey
Published simultaneously in Canada
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.
Library of Congress Cataloging-in-Publication Data:
Rao, J. N. K., 1937- author.
Small area estimation / J.N.K. Rao and Isabel Molina. – Second edition.
pages cm – (Wiley series in survey methodology)
Includes bibliographical references and index.
ISBN 978-1-118-73578-7 (cloth)
1. Small area statistics. 2. Sampling (Statistics) 3. Estimation theory. I. Molina, Isabel, 1975- author. II. Title. III. Series: Wiley series in survey methodology.
QA276.6.R344 2015
519.5′2 – dc23
2015012610
To Neela and Ángeles
Figure 3.1 Direct, Census, composite SPREE, and GLSM estimates of Row Profiles for Canadian provinces Newfoundland and Labrador (a) and Quebec (b), for Two-Digit Occupation class A1.
Figure 3.2 Direct, Census, composite SPREE, and GLSM estimates of row profiles for Canadian provinces Newfoundland and Labrador (a) and Nova Scotia (b), for two-digit occupation class B5.
Figure 6.1 EBLUP and Direct Area Estimates of Average Expenditure on Fresh Milk for Each Small Area (a). CVs of EBLUP and Direct Estimators for Each Small Area (b). Areas are Sorted by Decreasing Sample Size.
Figure 7.1 Leverage measures versus scaled squared residuals.
Figure 8.1 Naive Nonparametric Bootstrap MSE Estimates Against Analytical MSE Estimates (a). Bias-corrected Nonparametric Bootstrap MSE Estimates Against Analytical MSE Estimates (b).
Figure 8.2 EBLUP Estimates, Based on the Spatial FH Model with SAR Random Effects, and Direct Estimates of Mean Surface Area Used for Production of Grapes for Each Municipality (a). CVs of EBLUP Estimates and of Direct Estimates for Each Municipality (b). Municipalities are Sorted by Increasing CVs of Direct Estimates.
Figure 9.1 Bias (a) and MSE (b) over Simulated Populations of EB, Direct, and ELL Estimates of Percent Poverty Gap for Each Area
i
.
Source
: Adapted from Molina and Rao (2010).
Figure 9.2 True MSEs of EB Estimators of Percent Poverty Gap and Average of Bootstrap MSE Estimators Obtained with for Each Area
i
.
Source
: Adapted from Molina and Rao (2010).
Figure 9.3 Bias (a) and MSE (b) of EB, Direct and ELL Estimators of the Percent Poverty Gap for Each Area
i
under Design-Based Simulations.
Source
: Adapted from Molina and Rao (2010).
Figure 9.4 Index Plot of Residuals (a) and Histogram of Residuals (b) from the Fitting of the Basic Unit Level Model with Response Variable log(income+constant).
Figure 10.1 Coefficient of Variation (CV) of Direct and HB Estimates.
Source
: Adapted from Figure 3 in You, Rao, and Gambino (2003).
Figure 10.2 CPO Comparison Plot for Models 1–3.
Source
: Adapted from Figure 1 in You and Rao (2000).
Figure 10.3 Direct, Cross-sectional HB (HB2) and Cross-Sectional and Time Series HB (HB1) Estimates.
Source
: Adapted from Figure 2 in You, Rao, and Gambino (2003).
Figure 10.4 Coefficient of Variation of Cross-sectional HB (HB2) and Cross-Sectional and Time Series HB (HB1) Estimates.
Source
: Adapted from Figure 3 in You, Rao, and Gambino (2003).
Table 3.1 True State Proportions, Direct and Synthetic Estimates, and Associated Estimates of RRMSE
Table 3.2 Medians of Percent ARE of SPREE Estimates
Table 3.3 Percent Average Absolute Relative Bias (%)and Percent Average RRMSE (%) of Estimators
Table 3.4 Batting Averages for 18 Baseball Players
Table 6.1 Values of for States with More Than 500 Small Places
Table 6.2 Values of Percentage Absolute Relative Error of Estimates from True Values: Places with Population Less Than 500
Table 6.3 Average MSE of EBLUP Estimators Based on REML, LL, LLM, YL, and YLM Methods of Estimating
Table 6.4 % Relative Bias (RB) of Estimators of
Table 7.1 EBLUP Estimates of County Means and Estimated Standard Errors of EBLUP and Survey Regression Estimates
Table 7.2 Unconditional Comparisons of Estimators: Real and Synthetic Population
Table 7.3 Effect of Between-Area Homogeneity on the Performance of SSD and EBLUP
Table 7.4 EBLUP and Pseudo-EBLUP Estimates and Associated Standard Errors (s.e.): County Corn Crop Areas
Table 7.5 Average Absolute Bias (), Average Root Mean Squared Error () of Estimators, and Percent Average Absolute Relative Bias () of MSE Estimators
Table 8.1 Distribution of Coefficient of Variation (%)
Table 8.2 Average Absolute Relative Bias () and Average Relative Root MSE () of SYN, SSD, FH, and EBLUP (State-Space)
Table 9.1 Percent Average Relative Bias () of MSE Estimators
Table 10.1 MSE Estimates and Posterior Variance for Four States
Table 10.2 1991 Canadian Census Undercount Estimates and Associated CVs
Table 10.3 EBLUP and HB Estimates and Associated Standard Errors: County Corn Areas
Table 10.4 Pseudo-HB and Pseudo-EBLUP Estimates and Associated Standard Errors: County Corn Areas
Table 10.5 Estimated % CVs of the Direct, EB, and HB Estimators of Poverty Incidence for Selected Provinces by Gender
Table 10.6 Average Absolute Relative Error (ARE%): Median Income of Four-Person Families
Table 10.7 Comparison of Models 1–3: Mortality Rates
The history of modern sample surveys dates back to the nineteenth century, but the field did not fully emerge until the 1930s. It grew considerably during the World War II, and has been expanding at a tremendous rate ever since. Over time, the range of topics investigated using survey methods has broadened enormously as policy makers and researchers have learned to appreciate the value of quantitative data and as survey researchers—in response to policy makers' demands—have tackled topics previously considered unsuitable for study using survey methods. The range of analyses of survey data has also expanded, as users of survey data have become more sophisticated and as major developments in computing power and software have simplified the computations involved. In the early days, users were mostly satisfied with national estimates and estimates for major geographic regions and other large domains. The situation is very different today: more and more policy makers are demanding estimates for small domains for use in making policy decisions. For example, population surveys are often required to provide estimates of adequate precision for domains defined in terms of some combination of factors such as age, sex, race/ethnicity, and poverty status. A particularly widespread demand from policy makers is for estimates at a finer level of geographic detail than the broad regions that were commonly used in the past. Thus, estimates are frequently needed for such entities as states, provinces, counties, school districts, and health service areas.
The need to provide estimates for small domains has led to developments in two directions. One direction is toward the use of sample designs that can produce domain estimates of adequate precision within the standard design-based mode of inference used in survey analysis (i.e., “direct estimates”). Many sample surveys are now designed to yield sufficient sample sizes for key domains to satisfy the precision requirements for those domains. This approach is generally used for socio-economic domains and for some larger geographic domains. However, the increase in overall sample size that this approach entails may well exceed the survey's funding resources and capabilities, particularly so when estimates are required for many geographic areas. In the United States, for example, few surveys are large enough to be capable of providing reliable subpopulation estimates for all 50 states, even if the sample is optimally allocated across states for this purpose. For very small geographic areas such as school districts, either a complete census or a sample of at least the size of the census of long-form sample (on average about 1 in 6 households nationwide) is required. Even censuses, however, although valuable, cannot be the complete solution for the production of small area estimates. In most countries, censuses are conducted only once a decade. They cannot, therefore, provide satisfactory small area estimates for intermediate time points during a decade for population characteristics that change markedly over time. Furthermore, census content is inherently severely restricted, so a census cannot provide small area estimates for all the characteristics that are of interest. Hence, another approach is needed.
The other direction for producing small area estimates is to turn away from conventional direct estimates toward the use of indirect model-dependent estimates. The model-dependent approach employs a statistical model that “borrows strength” in making an estimate for one small area from sample survey data collected in other small areas or at other time periods. This approach moves away from the design-based estimation of conventional direct estimates to indirect model-dependent estimates. Naturally, concerns are raised about the reliance on models for the production of such small area estimates. However, the demand for small area estimates is strong and increasing, and models are needed to satisfy that demand in many cases. As a result, many survey statisticians have come to accept the model-dependent approach in the right circumstances, and the approach is being used in a number of important cases. Examples of major small area estimation programs in the United States include the following: the Census Bureau's Small Area Income and Poverty Estimates program, which regularly produces estimates of income and poverty measures for various population subgroups for states, counties, and school districts; the Bureau of Labor Statistics' Local Area Unemployment Statistics program, which produces monthly estimates of employment and unemployment for states, metropolitan areas, counties, and certain subcounty areas; the National Agricultural Statistics Service's County Estimates Program, which produces county estimates of crop yield; and the estimates of substance abuse in states and metropolitan areas, which are produced by the Substance Abuse and Mental Health Services Administration (see Chapter 1).
The essence of all small area methods is the use of auxiliary data available at the small area level, such as administrative data or data from the last census. These data are used to construct predictor variables for use in a statistical model that can be used to predict the estimate of interest for all small areas. The effectiveness of small area estimation depends initially on the availability of good predictor variables that are uniformly measured over the total area. It next depends on the choice of a good prediction model. Effective use of small area estimation methods further depends on a careful, thorough evaluation of the quality of the model. Finally, when small area estimates are produced, they should be accompanied by valid measures of their precision.
Early applications of small area estimation methods employed only simple methods. At that time, the choice of the method for use in particular case was relatively simple, being limited by the computable methods then in existence. However, the situation has changed enormously in recent years, and particularly in the last decade. There now exist a wide range of different, often complex, models that can be used, depending on the nature of the measurement of the small area estimate (e.g., a binary or continuous variable) and on the auxiliary data available. One key distinction in model construction is between situations where the auxiliary data are available for the individual units in the population and those where they are available only at the aggregate level for each small area. In the former case, the data can be used in unit level models, whereas in the latter they can be used only in area level models. Another feature involved in the choice of model is whether the model borrows strength cross-sectionally, over time, or both. There are also now a number of different approaches, such as empirical best linear prediction (EBLUP), empirical Bayes (EB), and hierarchical Bayes (HB), which can be used to estimate the models and the variability of the model-dependent small area estimates. Moreover, complex procedures that would have been extremely difficult to apply a few years ago can now be implemented fairly straightforwardly, taking advantage of the continuing increases in computing power and the latest developments in software.
The wide range of possible models and approaches now available for use can be confusing to those working in this area. J.N.K. Rao's book is therefore a timely contribution, coming at a point in the subject's development when an integrated, systematic treatment is needed. Rao has done a great service in producing this authoritative and comprehensive account of the subject. This book will help to advance the subject and be a valuable resource for practitioners and theorists alike.
Graham Kalton
Small area estimation (SAE) deals with the problem of producing reliable estimates of parameters of interest and the associated measures of uncertainty for subpopulations (areas or domains) of a finite population for which samples of inadequate sizes or no samples are available. Traditional “direct estimates,” based only on the area-specific sample data, are not suitable for SAE, and it is necessary to “borrow strength” across related small areas through supplementary information to produce reliable “indirect” estimates for small areas. Indirect model-based estimation methods, based on explicit linking models, are now widely used.
The first edition of Small Area Estimation (Rao 2003a) provided a comprehensive account of model-based methods for SAE up to the end of 2002. It is gratifying to see the enthusiastic reception it has received, as judged by the significant number of citations and the rapid growth in SAE literature over the past 12 years. Demand for reliable small area estimates has also greatly increased worldwide. As an example, the estimation of complex poverty measures at the municipality level is of current interest, and World Bank uses a model-based method, based on simulating multiple censuses, in more than 50 countries worldwide to produce poverty statistics for small areas.
The main aim of the present second edition is to update the first edition by providing a comprehensive account of important theoretical developments from 2003 to 2014. New SAE literature is quite extensive and often involves complex theory to handle model misspecifications and other complexities. We have retained a large portion of the material from the first edition to make the book self-contained, and supplemented it with selected new developments in theory and methods of SAE. Notations and terminology used in the first edition are largely retained. As in the first edition, applications are included throughout the chapters. An added feature of the second edition is the inclusion of sections (Sections *Software, *Software, 7.7, 8.11, and 9.11) describing specific R software for SAE, concretely the R package sae (Molina and Marhuenda 2013; Molina and Marhuenda 2015). These sections include examples of SAE applications using data sets included in the package and provide all the necessary R codes, so that the user can exactly replicate the applications. New sections and old sections with significant changes are indicated by an asterisk in the book. Chapter 3 on “Traditional Demographic Methods” from first edition is deleted partly due to page constraints and the fact that the material is somewhat unrelated to mainstream model-based methods. Also, we have not been able to keep up to date with the new developments in demographic methods.
Chapter 1 introduces basic terminology related to SAE and presents selected important applications as motivating examples. Chapter 2, as in the first edition, presents a concise account of direct estimation of totals or means for small areas and addresses survey design issues that have a bearing on SAE. New Section *Optimal Sample Allocation for Planned Domains deals with optimal sample allocation for planned domains and the estimation of marginal row and column strata means in the presence of two-way stratification. Chapter 3 gives a fairly detailed account of traditional indirect estimation based on implicit linking models. The well-known James–Stein method of composite estimation is also studied in the context of sample survey data. New Section *Generalized SPREE studies generalized structure preserving estimation (GSPREE) based on relaxing some interaction assumptions made in the traditional SPREE, which is often used in practice because it makes fuller use of reliable direct estimates at a higher level to produce synthetic estimates. Another important addition is weight sharing (or splitting) methods studied in Section *Weight-Sharing Methods. The weight-sharing methods produce a two-way table of weights with rows as the units in the full sample and columns as the areas such that the cell weights in each row add up to the original sample weight. Such methods are especially useful in micro-simulation modeling that can involve a large number of variables of interest.
Explicit small area models that account for between-area variability are introduced in Chapter 4 (previous Chapter 5), including linear mixed models and generalized linear mixed models such as logistic linear mixed models with random area effects. The models are classified into two broad categories: (i) area level models that relate the small area means or totals to area level covariates; and (ii) unit level models that relate the unit values of a study variable to unit-specific auxiliary variables. Extensions of the models to handle complex data structures, such as spatial dependence and time series structures, are also considered. New Section *Semi-parametric Mixed Models introduces semi-parametric mixed models, which are studied later. Chapter 5 (previous Chapter 6) studies linear mixed models involving fixed and random effects. It gives general results on empirical best linear-unbiased prediction (EBLUP) and the estimation of mean squared error (MSE) of the EBLUP. A detailed account of model identification and checking for linear mixed models is presented in the new Section *Model Identification and Checking. Available SAS software and R statistical software for linear mixed models are summarized in the new Section *Software. The R package sae specifically designed for SAE is also described.
Chapter 6 of the First Edition provided a detailed account of EBLUP estimation of small area means or totals for the basic area level and unit level models, using the general theory given in Chapter 5. In the past 10 years or so, researchers have done extensive work on those two models, especially addressing problems related to model misspecification and other practical issues. As a result, we decided to split the old Chapter 6 into two new chapters, with Chapter 6 focusing on area level models and Chapter 7 addressing unit level models. New topics covered in Chapter 6 include bootstrap MSE estimation (Section *Bootstrap MSE Estimation) and robust estimation in the presence of outliers (Section *Robust estimation in the presence of outliers). Section *Practical issues deals with practical issues related to the basic area level model. It includes important topics such as covariates subject to sampling errors (Section *Practical issues.4), misspecification of linking models (Section *Practical issues.7), benchmarking of model-based area estimators to ensure agreement with a reliable direct estimate when aggregated (Section *Practical issues.6), and the use of “big data” as possible covariates in area level models (Section *Practical issues.5). Functions of the R package sae designed for estimation under the area level model are described in Section *Software. An example illustrating the use of these functions is provided. New topics introduced in Chapter 7 include bootstrap MSE estimation (Section *Bootstrap MSE Estimation), outlier robust EBLUP estimation (Section *Outlier Robust EBLUP Estimation), and M-quantile regression (Section *M-Quantile Regression). Section *Practical Issues deals with practical issues related to the basic unit level model. It presents methods to deal with important topics, including measurement errors in covariates (Section *Practical Issues.4), model misspecification (Section *Practical Issues.5), and semi-parametric nested error models (Sections Semi-parametric Nested Error Model: EBLUP and Semi-parametric Nested Error Model: REBLUP). Most of the published literature assumes that the assumed model for the population values also holds for the sample. However, in many applications, this assumption may not be true due to informative sampling leading to sample selection bias. Section *Practical Issues.3 gives a detailed treatment of methods to make valid inferences under informative sampling. Functions of R package sae dealing with the basic unit level model are described in Section *Software. The use of these functions is illustrated through an application to the County Crop Areas data of Battese, Harter, and Fuller (1988). This application includes calculation of model diagnostics and drawing residual plots. Several important applications are also presented in Chapters 6 and 7.
New chapters 8, 9, and 10 cover the same material as the corresponding chapters in the first edition. Chapter 8 contains EBLUP theory for various extensions of the basic area level and unit level models, providing updates to the sections in the first edition, in particular a more detailed account of spatial and two-level models. Section *Spatial Models on spatial models is updated, and functions of the R package sae dealing with spatial area level models are described in Section *Software. An example illustrating the use of these functions is provided. Section *Two-fold Subarea Level Models presents theory for two-fold subarea level models, which are natural extensions of the basic area level models. Chapter 9 presents empirical Bayes (EB) estimation. The EB method (also called empirical best) is more generally applicable than the EBLUP method. New Section *EB Confidence Intervals gives an account of methods for constructing confidence intervals in the case of basic area level model. EB estimation of general area parameters is the theme of Section *EB Estimation of General Finite Population Parameters, in particular complex poverty indicators studied by the World Bank. EB method is compared to the World Bank method in simulation experiments (Section *EB Estimation of General Finite Population Parameters.6). R software for EB estimation of general area parameters is described in Section *Software, which includes an example on estimation of poverty indicators. Binary data and disease mapping from count data are studied in Sections Binary Data and Disease Mapping, respectively. An important addition is Section *Design-Weighted EB Estimation: Exponential Family Models dealing with design-weighted EB estimation under exponential family models. Previous sections on constrained EB estimation and empirical linear Bayes estimation are retained.
Finally, Chapter 10 presents a self-contained account of the Hierarchical Bayes (HB) approach based on specifying prior distributions on the model parameters. Basic Markov chain Monte Carlo (MCMC) methods for HB inference, including model determination, are presented in Section MCMC Methods. Several new developments are presented, including HB estimation of complex general area parameters, in particular poverty indicators (Section *HB Estimation of General Finite Population Parameters), two-part nested error models (Section *Two-Part Nested Error Model), missing binary data (Section *Missing Binary Data), and approximate HB inference (Section *Approximate HB Inference and Data Cloning). Other sections in Chapter 10 more or less cover the material in the previous edition with some updates. Chapters 8–10 include brief descriptions of applications with real data sets.
As in the first edition, we discuss the advantages and limitations of different SAE methods throughout the book. We also emphasize the need for both internal and external evaluations. To this end, we have provided various methods for model selection from the data, and comparison of estimates derived from models to reliable values obtained from external sources, such as previous census or administrative data.
Proofs of some basic results are provided, but proofs of results that are technically involved or lengthy are omitted, as in the first edition. We have provided fairly self-contained accounts of direct estimation (Chapter 2), EBLUP and EB estimation (Chapters 5 and 9), and HB estimation (Chapter 10). However, prior exposure to a standard text in mathematical statistics, such as the 2001 Brooks/Cole book Statistical Inference (second edition) by G. Casella and R. L. Berger, is essential. Also, a basic course in regression and mixed models, such as the 2001 Wiley book Generalized, Linear and Mixed Models by C. E. McCulloch and S. E. Searle, would be helpful in understanding model-based SAE. A basic course in survey sampling techniques, such as the 1977 Wiley book Sampling Techniques (third edition) by W.G. Cochran is also useful but not essential.
This book is intended primarily as a research monograph, but it is also suitable for a graduate level course on SAE, as in the case of the first edition. Practitioners interested in learning SAE methods may also find portions of this text useful, in particular Chapters 3, 6, 7 and Sections Introduction, MCMC Methods, Basic Area Level Model and 10.5 as well as the examples and applications presented throughout the book.
We are thankful to Emily Berg, Yves Berger, Ansu Chatterjee, Gauri Datta, Laura Dumitrescu, Wayne Fuller, Malay Ghosh, David Haziza, Jiming Jiang, Partha Lahiri, Bal Nandram, Jean Opsomer, and Mikhail Sverchkov for reading portions of the book and providing helpful comments and suggestions, to Domingo Morales for providing a very helpful list of publications in SAE and to Pedro Dulce for providing us with tailor made software for making author and subject indices.
J. N. K. Rao and Isabel Molina
January, 2015
Sample surveys are widely used to provide estimates of totals, means, and other parameters not only for the total population of interest but also for subpopulations (or domains) such as geographic areas and socio-demographic groups. Direct estimates of a domain parameter are based only on the domain-specific sample data. In particular, direct estimates are generally “design-based” in the sense that they make use of “survey weights,” and the associated inferences (standard errors, confidence intervals, etc.) are based on the probability distribution induced by the sample design, with the population values held fixed. Standard sampling texts (e.g., the 1977 Wiley book by W.G. Cochran) provide extensive accounts of design-based direct estimation. Models that treat the population values as random may also be used to obtain model-dependent direct estimates. Such estimates in general do not depend on survey weights, and the associated inferences are based on the probability distribution induced by the assumed model (e.g., the 2001 Wiley book by R. Valliant, A.H. Dorfman, and R.M. Royall).
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
