Small Area Estimation - J. N. K. Rao - E-Book

Small Area Estimation E-Book

J. N. K. Rao

0,0
98,99 €

oder
-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Praise for the First Edition "This pioneering work, in which Rao provides a comprehensive and up-to-date treatment of small area estimation, will become a classic...I believe that it has the potential to turn small area estimation...into a larger area of importance to both researchers and practitioners." --Journal of the American Statistical Association Written by two experts in the field, Small Area Estimation, Second Edition provides a comprehensive and up-to-date account of the methods and theory of small area estimation (SAE), particularly indirect estimation based on explicit small area linking models. The model-based approach to small area estimation offers several advantages including increased precision, the derivation of "optimal" estimates and associated measures of variability under an assumed model, and the validation of models from the sample data. Emphasizing real data throughout, the Second Edition maintains a self-contained account of crucial theoretical and methodological developments in the field of SAE. The new edition provides extensive accounts of new and updated research, which often involves complex theory to handle model misspecifications and other complexities. Including information on survey design issues and traditional methods employing indirect estimates based on implicit linking models, Small Area Estimation, Second Edition also features: * Additional sections describing the use of R code data sets for readers to use when replicating applications * Numerous examples of SAE applications throughout each chapter, including recent applications in U.S. Federal programs * New topical coverage on extended design issues, synthetic estimation, further refinements and solutions to the Fay-Herriot area level model, basic unit level models, and spatial and time series models * A discussion of the advantages and limitations of various SAE methods for model selection from data as well as comparisons of estimates derived from models to reliable values obtained from external sources, such as previous census or administrative data Small Area Estimation, Second Edition is an excellent reference for practicing statisticians and survey methodologists as well as practitioners interested in learning SAE methods. The Second Edition is also an ideal textbook for graduate-level courses in SAE and reliable small area statistics.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 821

Veröffentlichungsjahr: 2015

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

Series Page

Title Page

Copyright

Dedication

List of Figures

List of Tables

Foreword to the First Edition

Preface to the Second Edition

Preface to the First Edition

Chapter 1: Introduction

1.1 What Is a Small Area?

1.2 Demand for Small Area Statistics

1.3 Traditional Indirect Estimators

1.4 Small Area Models

1.5 Model-Based Estimation

1.6 Some Examples

Chapter 2: Direct Domain Estimation

2.1 Introduction

2.2 Design-Based Approach

2.3 Estimation of Totals

2.4 Domain Estimation

2.5 Modified GREG Estimator

2.6 Design Issues

2.7 Optimal Sample Allocation for Planned Domains

2.8 Proofs

Chapter 3: Indirect Domain Estimation

3.1 Introduction

3.2 Synthetic Estimation

3.3 Composite Estimation

3.4 James–Stein Method

3.5 Proofs

Chapter 4: Small Area Models

4.1 Introduction

4.2 Basic Area Level Model

4.3 Basic Unit Level Model

4.4 Extensions: Area Level Models

4.5 Extensions: Unit Level Models

4.6 Generalized Linear Mixed Models

Chapter 5: Empirical Best Linear Unbiased Prediction (EBLUP): Theory

5.1 Introduction

5.2 General Linear Mixed Model

5.3 Block Diagonal Covariance Structure

5.4 Model Identification and Checking

5.5 Software

5.6 Proofs

Chapter 6: Empirical Best Linear Unbiased Prediction (EBLUP): Basic Area Level Model

6.1 EBLUP Estimation

6.2 MSE Estimation

6.3 *Robust estimation in the presence of outliers

6.4 *Practical issues

6.5 *Software

Chapter 7: Basic Unit Level Model

7.1 EBLUP estimation

7.2 MSE Estimation

7.3 Applications

7.4 Outlier Robust EBLUP Estimation

7.5 M-Quantile Regression

7.6 Practical Issues

7.7 Software

7.8 Proofs

Chapter 8: EBLUP: Extensions

8.1 Multivariate Fay–Herriot Model

8.2 Correlated Sampling Errors

8.3 Time Series and Cross-Sectional Models

8.4 Spatial Models

8.5 Two-fold Subarea Level Models

8.6 Multivariate Nested Error Regression Model

8.7 Two-fold Nested Error Regression Model

8.8 Two-Level Model

8.9 Models for Multinomial Counts

8.10 EBLUP for Vectors of Area Proportions

8.11 Software

Chapter 9: Empirical Bayes (EB) Method

9.1 Introduction

9.2 Basic Area Level Model

9.3 Linear Mixed Models

9.4 EB Estimation of General Finite Population Parameters

9.5 Binary Data

9.6 Disease Mapping

9.7 Design-Weighted EB Estimation: Exponential Family Models

9.8 Triple-goal Estimation

9.9 Empirical Linear Bayes

9.10 Constrained LB

9.11 Software

9.12 Proofs

Chapter 10: Hierarchical Bayes (HB) Method

10.1 Introduction

10.2 MCMC Methods

10.3 Basic Area Level Model

10.4 Unmatched Sampling and Linking Area Level Models

10.5 Basic Unit Level Model

10.6 General ANOVA Model

10.7 HB Estimation of General Finite Population Parameters

10.8 Two-Level Models

10.9 Time Series and Cross-sectional Models

10.10 Multivariate Models

10.11 Disease Mapping Models

10.12 Two-Part Nested Error Model

10.13 Binary Data

10.14 Missing Binary Data

10.15 Natural Exponential Family Models

10.16 Constrained HB

10.17 Approximate HB Inference and Data Cloning

10.18 Proofs

References

Author Index

Subject Index

Wiley Series In Survey Methodology

End User License Agreement

Pages

xv

xvi

xvii

xviii

xix

xx

xxi

xxiii

xxiv

xxv

xxvi

xxvii

xxviii

xxix

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

161

162

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

163

164

165

166

167

168

169

170

171

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

227

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

228

229

230

231

232

233

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

405

406

407

408

409

410

411

412

413

414

415

416

417

418

419

420

421

422

423

424

425

426

427

428

429

430

431

432

433

434

435

436

437

438

439

440

441

Guide

Cover

Table of Contents

Foreword To The First Edition

Preface To The Second Edition

Begin Reading

List of Illustrations

Chapter 3: Indirect Domain Estimation

Figure 3.1 Direct, Census, composite SPREE, and GLSM estimates of Row Profiles for Canadian provinces Newfoundland and Labrador (a) and Quebec (b), for Two-Digit Occupation class A1.

Figure 3.2 Direct, Census, composite SPREE, and GLSM estimates of row profiles for Canadian provinces Newfoundland and Labrador (a) and Nova Scotia (b), for two-digit occupation class B5.

Chapter 6: Empirical Best Linear Unbiased Prediction (EBLUP): Basic Area Level Model

Figure 6.1 EBLUP and Direct Area Estimates of Average Expenditure on Fresh Milk for Each Small Area (a). CVs of EBLUP and Direct Estimators for Each Small Area (b). Areas are Sorted by Decreasing Sample Size.

Chapter 7: Basic Unit Level Model

Figure 7.1 Leverage measures versus scaled squared residuals.

Chapter 8: EBLUP: Extensions

Figure 8.1 Naive Nonparametric Bootstrap MSE Estimates Against Analytical MSE Estimates (a). Bias-corrected Nonparametric Bootstrap MSE Estimates Against Analytical MSE Estimates (b).

Figure 8.2 EBLUP Estimates, Based on the Spatial FH Model with SAR Random Effects, and Direct Estimates of Mean Surface Area Used for Production of Grapes for Each Municipality (a). CVs of EBLUP Estimates and of Direct Estimates for Each Municipality (b). Municipalities are Sorted by Increasing CVs of Direct Estimates.

Chapter 9: Empirical Bayes (EB) Method

Figure 9.1 Bias (a) and MSE (b) over Simulated Populations of EB, Direct, and ELL Estimates of Percent Poverty Gap for Each Area

i

.

Source

: Adapted from Molina and Rao (2010).

Figure 9.2 True MSEs of EB Estimators of Percent Poverty Gap and Average of Bootstrap MSE Estimators Obtained with for Each Area

i

.

Source

: Adapted from Molina and Rao (2010).

Figure 9.3 Bias (a) and MSE (b) of EB, Direct and ELL Estimators of the Percent Poverty Gap for Each Area

i

under Design-Based Simulations.

Source

: Adapted from Molina and Rao (2010).

Figure 9.4 Index Plot of Residuals (a) and Histogram of Residuals (b) from the Fitting of the Basic Unit Level Model with Response Variable log(income+constant).

Chapter 10: Hierarchical Bayes (HB) Method

Figure 10.1 Coefficient of Variation (CV) of Direct and HB Estimates.

Source

: Adapted from Figure 3 in You, Rao, and Gambino (2003).

Figure 10.2 CPO Comparison Plot for Models 1–3.

Source

: Adapted from Figure 1 in You and Rao (2000).

Figure 10.3 Direct, Cross-sectional HB (HB2) and Cross-Sectional and Time Series HB (HB1) Estimates.

Source

: Adapted from Figure 2 in You, Rao, and Gambino (2003).

Figure 10.4 Coefficient of Variation of Cross-sectional HB (HB2) and Cross-Sectional and Time Series HB (HB1) Estimates.

Source

: Adapted from Figure 3 in You, Rao, and Gambino (2003).

List of Tables

Chapter 3: Indirect Domain Estimation

Table 3.1 True State Proportions, Direct and Synthetic Estimates, and Associated Estimates of RRMSE

Table 3.2 Medians of Percent ARE of SPREE Estimates

Table 3.3 Percent Average Absolute Relative Bias (%)and Percent Average RRMSE (%) of Estimators

Table 3.4 Batting Averages for 18 Baseball Players

Chapter 6: Empirical Best Linear Unbiased Prediction (EBLUP): Basic Area Level Model

Table 6.1 Values of for States with More Than 500 Small Places

Table 6.2 Values of Percentage Absolute Relative Error of Estimates from True Values: Places with Population Less Than 500

Table 6.3 Average MSE of EBLUP Estimators Based on REML, LL, LLM, YL, and YLM Methods of Estimating

Table 6.4 % Relative Bias (RB) of Estimators of

Chapter 7: Basic Unit Level Model

Table 7.1 EBLUP Estimates of County Means and Estimated Standard Errors of EBLUP and Survey Regression Estimates

Table 7.2 Unconditional Comparisons of Estimators: Real and Synthetic Population

Table 7.3 Effect of Between-Area Homogeneity on the Performance of SSD and EBLUP

Table 7.4 EBLUP and Pseudo-EBLUP Estimates and Associated Standard Errors (s.e.): County Corn Crop Areas

Table 7.5 Average Absolute Bias (), Average Root Mean Squared Error () of Estimators, and Percent Average Absolute Relative Bias () of MSE Estimators

Chapter 8: EBLUP: Extensions

Table 8.1 Distribution of Coefficient of Variation (%)

Table 8.2 Average Absolute Relative Bias () and Average Relative Root MSE () of SYN, SSD, FH, and EBLUP (State-Space)

Chapter 9: Empirical Bayes (EB) Method

Table 9.1 Percent Average Relative Bias (

) of MSE Estimators

Chapter 10: Hierarchical Bayes (HB) Method

Table 10.1 MSE Estimates and Posterior Variance for Four States

Table 10.2 1991 Canadian Census Undercount Estimates and Associated CVs

Table 10.3 EBLUP and HB Estimates and Associated Standard Errors: County Corn Areas

Table 10.4 Pseudo-HB and Pseudo-EBLUP Estimates and Associated Standard Errors: County Corn Areas

Table 10.5 Estimated % CVs of the Direct, EB, and HB Estimators of Poverty Incidence for Selected Provinces by Gender

Table 10.6 Average Absolute Relative Error (ARE%): Median Income of Four-Person Families

Table 10.7 Comparison of Models 1–3: Mortality Rates

Small Area Estimation

 

Second Edition

 

 

J.N.K. Rao And Isabel Molina Wiley Series in Survey Methodology

 

 

 

 

Copyright © 2015 by John Wiley & Sons, Inc. All rights reserved

Published by John Wiley & Sons, Inc., Hoboken, New Jersey

Published simultaneously in Canada

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.

Library of Congress Cataloging-in-Publication Data:

Rao, J. N. K., 1937- author.

Small area estimation / J.N.K. Rao and Isabel Molina. – Second edition.

pages cm – (Wiley series in survey methodology)

Includes bibliographical references and index.

ISBN 978-1-118-73578-7 (cloth)

1. Small area statistics. 2. Sampling (Statistics) 3. Estimation theory. I. Molina, Isabel, 1975- author. II. Title. III. Series: Wiley series in survey methodology.

QA276.6.R344 2015

519.5′2 – dc23

2015012610

To Neela and Ángeles

List of Figures

Figure 3.1 Direct, Census, composite SPREE, and GLSM estimates of Row Profiles for Canadian provinces Newfoundland and Labrador (a) and Quebec (b), for Two-Digit Occupation class A1.

Figure 3.2 Direct, Census, composite SPREE, and GLSM estimates of row profiles for Canadian provinces Newfoundland and Labrador (a) and Nova Scotia (b), for two-digit occupation class B5.

Figure 6.1 EBLUP and Direct Area Estimates of Average Expenditure on Fresh Milk for Each Small Area (a). CVs of EBLUP and Direct Estimators for Each Small Area (b). Areas are Sorted by Decreasing Sample Size.

Figure 7.1 Leverage measures versus scaled squared residuals.

Figure 8.1 Naive Nonparametric Bootstrap MSE Estimates Against Analytical MSE Estimates (a). Bias-corrected Nonparametric Bootstrap MSE Estimates Against Analytical MSE Estimates (b).

Figure 8.2 EBLUP Estimates, Based on the Spatial FH Model with SAR Random Effects, and Direct Estimates of Mean Surface Area Used for Production of Grapes for Each Municipality (a). CVs of EBLUP Estimates and of Direct Estimates for Each Municipality (b). Municipalities are Sorted by Increasing CVs of Direct Estimates.

Figure 9.1 Bias (a) and MSE (b) over Simulated Populations of EB, Direct, and ELL Estimates of Percent Poverty Gap for Each Area

i

.

Source

: Adapted from Molina and Rao (2010).

Figure 9.2 True MSEs of EB Estimators of Percent Poverty Gap and Average of Bootstrap MSE Estimators Obtained with for Each Area

i

.

Source

: Adapted from Molina and Rao (2010).

Figure 9.3 Bias (a) and MSE (b) of EB, Direct and ELL Estimators of the Percent Poverty Gap for Each Area

i

under Design-Based Simulations.

Source

: Adapted from Molina and Rao (2010).

Figure 9.4 Index Plot of Residuals (a) and Histogram of Residuals (b) from the Fitting of the Basic Unit Level Model with Response Variable log(income+constant).

Figure 10.1 Coefficient of Variation (CV) of Direct and HB Estimates.

Source

: Adapted from Figure 3 in You, Rao, and Gambino (2003).

Figure 10.2 CPO Comparison Plot for Models 1–3.

Source

: Adapted from Figure 1 in You and Rao (2000).

Figure 10.3 Direct, Cross-sectional HB (HB2) and Cross-Sectional and Time Series HB (HB1) Estimates.

Source

: Adapted from Figure 2 in You, Rao, and Gambino (2003).

Figure 10.4 Coefficient of Variation of Cross-sectional HB (HB2) and Cross-Sectional and Time Series HB (HB1) Estimates.

Source

: Adapted from Figure 3 in You, Rao, and Gambino (2003).

LIST OF TABLES

Table 3.1 True State Proportions, Direct and Synthetic Estimates, and Associated Estimates of RRMSE

Table 3.2 Medians of Percent ARE of SPREE Estimates

Table 3.3 Percent Average Absolute Relative Bias (%)and Percent Average RRMSE (%) of Estimators

Table 3.4 Batting Averages for 18 Baseball Players

Table 6.1 Values of for States with More Than 500 Small Places

Table 6.2 Values of Percentage Absolute Relative Error of Estimates from True Values: Places with Population Less Than 500

Table 6.3 Average MSE of EBLUP Estimators Based on REML, LL, LLM, YL, and YLM Methods of Estimating

Table 6.4 % Relative Bias (RB) of Estimators of

Table 7.1 EBLUP Estimates of County Means and Estimated Standard Errors of EBLUP and Survey Regression Estimates

Table 7.2 Unconditional Comparisons of Estimators: Real and Synthetic Population

Table 7.3 Effect of Between-Area Homogeneity on the Performance of SSD and EBLUP

Table 7.4 EBLUP and Pseudo-EBLUP Estimates and Associated Standard Errors (s.e.): County Corn Crop Areas

Table 7.5 Average Absolute Bias (), Average Root Mean Squared Error () of Estimators, and Percent Average Absolute Relative Bias () of MSE Estimators

Table 8.1 Distribution of Coefficient of Variation (%)

Table 8.2 Average Absolute Relative Bias () and Average Relative Root MSE () of SYN, SSD, FH, and EBLUP (State-Space)

Table 9.1 Percent Average Relative Bias () of MSE Estimators

Table 10.1 MSE Estimates and Posterior Variance for Four States

Table 10.2 1991 Canadian Census Undercount Estimates and Associated CVs

Table 10.3 EBLUP and HB Estimates and Associated Standard Errors: County Corn Areas

Table 10.4 Pseudo-HB and Pseudo-EBLUP Estimates and Associated Standard Errors: County Corn Areas

Table 10.5 Estimated % CVs of the Direct, EB, and HB Estimators of Poverty Incidence for Selected Provinces by Gender

Table 10.6 Average Absolute Relative Error (ARE%): Median Income of Four-Person Families

Table 10.7 Comparison of Models 1–3: Mortality Rates

Foreword to the First Edition

The history of modern sample surveys dates back to the nineteenth century, but the field did not fully emerge until the 1930s. It grew considerably during the World War II, and has been expanding at a tremendous rate ever since. Over time, the range of topics investigated using survey methods has broadened enormously as policy makers and researchers have learned to appreciate the value of quantitative data and as survey researchers—in response to policy makers' demands—have tackled topics previously considered unsuitable for study using survey methods. The range of analyses of survey data has also expanded, as users of survey data have become more sophisticated and as major developments in computing power and software have simplified the computations involved. In the early days, users were mostly satisfied with national estimates and estimates for major geographic regions and other large domains. The situation is very different today: more and more policy makers are demanding estimates for small domains for use in making policy decisions. For example, population surveys are often required to provide estimates of adequate precision for domains defined in terms of some combination of factors such as age, sex, race/ethnicity, and poverty status. A particularly widespread demand from policy makers is for estimates at a finer level of geographic detail than the broad regions that were commonly used in the past. Thus, estimates are frequently needed for such entities as states, provinces, counties, school districts, and health service areas.

The need to provide estimates for small domains has led to developments in two directions. One direction is toward the use of sample designs that can produce domain estimates of adequate precision within the standard design-based mode of inference used in survey analysis (i.e., “direct estimates”). Many sample surveys are now designed to yield sufficient sample sizes for key domains to satisfy the precision requirements for those domains. This approach is generally used for socio-economic domains and for some larger geographic domains. However, the increase in overall sample size that this approach entails may well exceed the survey's funding resources and capabilities, particularly so when estimates are required for many geographic areas. In the United States, for example, few surveys are large enough to be capable of providing reliable subpopulation estimates for all 50 states, even if the sample is optimally allocated across states for this purpose. For very small geographic areas such as school districts, either a complete census or a sample of at least the size of the census of long-form sample (on average about 1 in 6 households nationwide) is required. Even censuses, however, although valuable, cannot be the complete solution for the production of small area estimates. In most countries, censuses are conducted only once a decade. They cannot, therefore, provide satisfactory small area estimates for intermediate time points during a decade for population characteristics that change markedly over time. Furthermore, census content is inherently severely restricted, so a census cannot provide small area estimates for all the characteristics that are of interest. Hence, another approach is needed.

The other direction for producing small area estimates is to turn away from conventional direct estimates toward the use of indirect model-dependent estimates. The model-dependent approach employs a statistical model that “borrows strength” in making an estimate for one small area from sample survey data collected in other small areas or at other time periods. This approach moves away from the design-based estimation of conventional direct estimates to indirect model-dependent estimates. Naturally, concerns are raised about the reliance on models for the production of such small area estimates. However, the demand for small area estimates is strong and increasing, and models are needed to satisfy that demand in many cases. As a result, many survey statisticians have come to accept the model-dependent approach in the right circumstances, and the approach is being used in a number of important cases. Examples of major small area estimation programs in the United States include the following: the Census Bureau's Small Area Income and Poverty Estimates program, which regularly produces estimates of income and poverty measures for various population subgroups for states, counties, and school districts; the Bureau of Labor Statistics' Local Area Unemployment Statistics program, which produces monthly estimates of employment and unemployment for states, metropolitan areas, counties, and certain subcounty areas; the National Agricultural Statistics Service's County Estimates Program, which produces county estimates of crop yield; and the estimates of substance abuse in states and metropolitan areas, which are produced by the Substance Abuse and Mental Health Services Administration (see Chapter 1).

The essence of all small area methods is the use of auxiliary data available at the small area level, such as administrative data or data from the last census. These data are used to construct predictor variables for use in a statistical model that can be used to predict the estimate of interest for all small areas. The effectiveness of small area estimation depends initially on the availability of good predictor variables that are uniformly measured over the total area. It next depends on the choice of a good prediction model. Effective use of small area estimation methods further depends on a careful, thorough evaluation of the quality of the model. Finally, when small area estimates are produced, they should be accompanied by valid measures of their precision.

Early applications of small area estimation methods employed only simple methods. At that time, the choice of the method for use in particular case was relatively simple, being limited by the computable methods then in existence. However, the situation has changed enormously in recent years, and particularly in the last decade. There now exist a wide range of different, often complex, models that can be used, depending on the nature of the measurement of the small area estimate (e.g., a binary or continuous variable) and on the auxiliary data available. One key distinction in model construction is between situations where the auxiliary data are available for the individual units in the population and those where they are available only at the aggregate level for each small area. In the former case, the data can be used in unit level models, whereas in the latter they can be used only in area level models. Another feature involved in the choice of model is whether the model borrows strength cross-sectionally, over time, or both. There are also now a number of different approaches, such as empirical best linear prediction (EBLUP), empirical Bayes (EB), and hierarchical Bayes (HB), which can be used to estimate the models and the variability of the model-dependent small area estimates. Moreover, complex procedures that would have been extremely difficult to apply a few years ago can now be implemented fairly straightforwardly, taking advantage of the continuing increases in computing power and the latest developments in software.

The wide range of possible models and approaches now available for use can be confusing to those working in this area. J.N.K. Rao's book is therefore a timely contribution, coming at a point in the subject's development when an integrated, systematic treatment is needed. Rao has done a great service in producing this authoritative and comprehensive account of the subject. This book will help to advance the subject and be a valuable resource for practitioners and theorists alike.

Graham Kalton

Preface to the Second Edition

Small area estimation (SAE) deals with the problem of producing reliable estimates of parameters of interest and the associated measures of uncertainty for subpopulations (areas or domains) of a finite population for which samples of inadequate sizes or no samples are available. Traditional “direct estimates,” based only on the area-specific sample data, are not suitable for SAE, and it is necessary to “borrow strength” across related small areas through supplementary information to produce reliable “indirect” estimates for small areas. Indirect model-based estimation methods, based on explicit linking models, are now widely used.

The first edition of Small Area Estimation (Rao 2003a) provided a comprehensive account of model-based methods for SAE up to the end of 2002. It is gratifying to see the enthusiastic reception it has received, as judged by the significant number of citations and the rapid growth in SAE literature over the past 12 years. Demand for reliable small area estimates has also greatly increased worldwide. As an example, the estimation of complex poverty measures at the municipality level is of current interest, and World Bank uses a model-based method, based on simulating multiple censuses, in more than 50 countries worldwide to produce poverty statistics for small areas.

The main aim of the present second edition is to update the first edition by providing a comprehensive account of important theoretical developments from 2003 to 2014. New SAE literature is quite extensive and often involves complex theory to handle model misspecifications and other complexities. We have retained a large portion of the material from the first edition to make the book self-contained, and supplemented it with selected new developments in theory and methods of SAE. Notations and terminology used in the first edition are largely retained. As in the first edition, applications are included throughout the chapters. An added feature of the second edition is the inclusion of sections (Sections *Software, *Software, 7.7, 8.11, and 9.11) describing specific R software for SAE, concretely the R package sae (Molina and Marhuenda 2013; Molina and Marhuenda 2015). These sections include examples of SAE applications using data sets included in the package and provide all the necessary R codes, so that the user can exactly replicate the applications. New sections and old sections with significant changes are indicated by an asterisk in the book. Chapter 3 on “Traditional Demographic Methods” from first edition is deleted partly due to page constraints and the fact that the material is somewhat unrelated to mainstream model-based methods. Also, we have not been able to keep up to date with the new developments in demographic methods.

Chapter 1 introduces basic terminology related to SAE and presents selected important applications as motivating examples. Chapter 2, as in the first edition, presents a concise account of direct estimation of totals or means for small areas and addresses survey design issues that have a bearing on SAE. New Section *Optimal Sample Allocation for Planned Domains deals with optimal sample allocation for planned domains and the estimation of marginal row and column strata means in the presence of two-way stratification. Chapter 3 gives a fairly detailed account of traditional indirect estimation based on implicit linking models. The well-known James–Stein method of composite estimation is also studied in the context of sample survey data. New Section *Generalized SPREE studies generalized structure preserving estimation (GSPREE) based on relaxing some interaction assumptions made in the traditional SPREE, which is often used in practice because it makes fuller use of reliable direct estimates at a higher level to produce synthetic estimates. Another important addition is weight sharing (or splitting) methods studied in Section *Weight-Sharing Methods. The weight-sharing methods produce a two-way table of weights with rows as the units in the full sample and columns as the areas such that the cell weights in each row add up to the original sample weight. Such methods are especially useful in micro-simulation modeling that can involve a large number of variables of interest.

Explicit small area models that account for between-area variability are introduced in Chapter 4 (previous Chapter 5), including linear mixed models and generalized linear mixed models such as logistic linear mixed models with random area effects. The models are classified into two broad categories: (i) area level models that relate the small area means or totals to area level covariates; and (ii) unit level models that relate the unit values of a study variable to unit-specific auxiliary variables. Extensions of the models to handle complex data structures, such as spatial dependence and time series structures, are also considered. New Section *Semi-parametric Mixed Models introduces semi-parametric mixed models, which are studied later. Chapter 5 (previous Chapter 6) studies linear mixed models involving fixed and random effects. It gives general results on empirical best linear-unbiased prediction (EBLUP) and the estimation of mean squared error (MSE) of the EBLUP. A detailed account of model identification and checking for linear mixed models is presented in the new Section *Model Identification and Checking. Available SAS software and R statistical software for linear mixed models are summarized in the new Section *Software. The R package sae specifically designed for SAE is also described.

Chapter 6 of the First Edition provided a detailed account of EBLUP estimation of small area means or totals for the basic area level and unit level models, using the general theory given in Chapter 5. In the past 10 years or so, researchers have done extensive work on those two models, especially addressing problems related to model misspecification and other practical issues. As a result, we decided to split the old Chapter 6 into two new chapters, with Chapter 6 focusing on area level models and Chapter 7 addressing unit level models. New topics covered in Chapter 6 include bootstrap MSE estimation (Section *Bootstrap MSE Estimation) and robust estimation in the presence of outliers (Section *Robust estimation in the presence of outliers). Section *Practical issues deals with practical issues related to the basic area level model. It includes important topics such as covariates subject to sampling errors (Section *Practical issues.4), misspecification of linking models (Section *Practical issues.7), benchmarking of model-based area estimators to ensure agreement with a reliable direct estimate when aggregated (Section *Practical issues.6), and the use of “big data” as possible covariates in area level models (Section *Practical issues.5). Functions of the R package sae designed for estimation under the area level model are described in Section *Software. An example illustrating the use of these functions is provided. New topics introduced in Chapter 7 include bootstrap MSE estimation (Section *Bootstrap MSE Estimation), outlier robust EBLUP estimation (Section *Outlier Robust EBLUP Estimation), and M-quantile regression (Section *M-Quantile Regression). Section *Practical Issues deals with practical issues related to the basic unit level model. It presents methods to deal with important topics, including measurement errors in covariates (Section *Practical Issues.4), model misspecification (Section *Practical Issues.5), and semi-parametric nested error models (Sections Semi-parametric Nested Error Model: EBLUP and Semi-parametric Nested Error Model: REBLUP). Most of the published literature assumes that the assumed model for the population values also holds for the sample. However, in many applications, this assumption may not be true due to informative sampling leading to sample selection bias. Section *Practical Issues.3 gives a detailed treatment of methods to make valid inferences under informative sampling. Functions of R package sae dealing with the basic unit level model are described in Section *Software. The use of these functions is illustrated through an application to the County Crop Areas data of Battese, Harter, and Fuller (1988). This application includes calculation of model diagnostics and drawing residual plots. Several important applications are also presented in Chapters 6 and 7.

New chapters 8, 9, and 10 cover the same material as the corresponding chapters in the first edition. Chapter 8 contains EBLUP theory for various extensions of the basic area level and unit level models, providing updates to the sections in the first edition, in particular a more detailed account of spatial and two-level models. Section *Spatial Models on spatial models is updated, and functions of the R package sae dealing with spatial area level models are described in Section *Software. An example illustrating the use of these functions is provided. Section *Two-fold Subarea Level Models presents theory for two-fold subarea level models, which are natural extensions of the basic area level models. Chapter 9 presents empirical Bayes (EB) estimation. The EB method (also called empirical best) is more generally applicable than the EBLUP method. New Section *EB Confidence Intervals gives an account of methods for constructing confidence intervals in the case of basic area level model. EB estimation of general area parameters is the theme of Section *EB Estimation of General Finite Population Parameters, in particular complex poverty indicators studied by the World Bank. EB method is compared to the World Bank method in simulation experiments (Section *EB Estimation of General Finite Population Parameters.6). R software for EB estimation of general area parameters is described in Section *Software, which includes an example on estimation of poverty indicators. Binary data and disease mapping from count data are studied in Sections Binary Data and Disease Mapping, respectively. An important addition is Section *Design-Weighted EB Estimation: Exponential Family Models dealing with design-weighted EB estimation under exponential family models. Previous sections on constrained EB estimation and empirical linear Bayes estimation are retained.

Finally, Chapter 10 presents a self-contained account of the Hierarchical Bayes (HB) approach based on specifying prior distributions on the model parameters. Basic Markov chain Monte Carlo (MCMC) methods for HB inference, including model determination, are presented in Section MCMC Methods. Several new developments are presented, including HB estimation of complex general area parameters, in particular poverty indicators (Section *HB Estimation of General Finite Population Parameters), two-part nested error models (Section *Two-Part Nested Error Model), missing binary data (Section *Missing Binary Data), and approximate HB inference (Section *Approximate HB Inference and Data Cloning). Other sections in Chapter 10 more or less cover the material in the previous edition with some updates. Chapters 8–10 include brief descriptions of applications with real data sets.

As in the first edition, we discuss the advantages and limitations of different SAE methods throughout the book. We also emphasize the need for both internal and external evaluations. To this end, we have provided various methods for model selection from the data, and comparison of estimates derived from models to reliable values obtained from external sources, such as previous census or administrative data.

Proofs of some basic results are provided, but proofs of results that are technically involved or lengthy are omitted, as in the first edition. We have provided fairly self-contained accounts of direct estimation (Chapter 2), EBLUP and EB estimation (Chapters 5 and 9), and HB estimation (Chapter 10). However, prior exposure to a standard text in mathematical statistics, such as the 2001 Brooks/Cole book Statistical Inference (second edition) by G. Casella and R. L. Berger, is essential. Also, a basic course in regression and mixed models, such as the 2001 Wiley book Generalized, Linear and Mixed Models by C. E. McCulloch and S. E. Searle, would be helpful in understanding model-based SAE. A basic course in survey sampling techniques, such as the 1977 Wiley book Sampling Techniques (third edition) by W.G. Cochran is also useful but not essential.

This book is intended primarily as a research monograph, but it is also suitable for a graduate level course on SAE, as in the case of the first edition. Practitioners interested in learning SAE methods may also find portions of this text useful, in particular Chapters 3, 6, 7 and Sections Introduction, MCMC Methods, Basic Area Level Model and 10.5 as well as the examples and applications presented throughout the book.

We are thankful to Emily Berg, Yves Berger, Ansu Chatterjee, Gauri Datta, Laura Dumitrescu, Wayne Fuller, Malay Ghosh, David Haziza, Jiming Jiang, Partha Lahiri, Bal Nandram, Jean Opsomer, and Mikhail Sverchkov for reading portions of the book and providing helpful comments and suggestions, to Domingo Morales for providing a very helpful list of publications in SAE and to Pedro Dulce for providing us with tailor made software for making author and subject indices.

J. N. K. Rao and Isabel Molina

January, 2015

Preface to the First Edition

Sample surveys are widely used to provide estimates of totals, means, and other parameters not only for the total population of interest but also for subpopulations (or domains) such as geographic areas and socio-demographic groups. Direct estimates of a domain parameter are based only on the domain-specific sample data. In particular, direct estimates are generally “design-based” in the sense that they make use of “survey weights,” and the associated inferences (standard errors, confidence intervals, etc.) are based on the probability distribution induced by the sample design, with the population values held fixed. Standard sampling texts (e.g., the 1977 Wiley book by W.G. Cochran) provide extensive accounts of design-based direct estimation. Models that treat the population values as random may also be used to obtain model-dependent direct estimates. Such estimates in general do not depend on survey weights, and the associated inferences are based on the probability distribution induced by the assumed model (e.g., the 2001 Wiley book by R. Valliant, A.H. Dorfman, and R.M. Royall).

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!