83,99 €
A preeminent expert in the field explores new and exciting methodologies in the ever-growing field of robust statistics
Used to develop data analytical methods, which are resistant to outlying observations in the data, while capable of detecting outliers, robust statistics is extremely useful for solving an array of common problems, such as estimating location, scale, and regression parameters. Written by an internationally recognized expert in the field of robust statistics, this book addresses a range of well-established techniques while exploring, in depth, new and exciting methodologies. Local robustness and global robustness are discussed, and problems of non-identifiability and adaptive estimation are considered. Rather than attempt an exhaustive investigation of robustness, the author provides readers with a timely review of many of the most important problems in statistical inference involving robust estimation, along with a brief look at confidence intervals for location. Throughout, the author meticulously links research in maximum likelihood estimation with the more general M-estimation methodology. Specific applications and R and some MATLAB subroutines with accompanying data sets—available both in the text and online—are employed wherever appropriate.
Providing invaluable insights and guidance, Robustness Theory and Application:
Robustness Theory and Application is an important resource for all statisticians interested in the topic of robust statistics. This book encompasses both past and present research, making it a valuable supplemental text for graduate-level courses in robustness.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 350
Veröffentlichungsjahr: 2018
COVER
FOREWORD
PREFACE
1 INTRODUCTION TO ASYMPTOTIC CONVERGENCE
1.1 INTRODUCTION
1.2 PROBABILITY SPACES AND DISTRIBUTION FUNCTIONS
1.3 LAWS OF LARGE NUMBERS
1.4 THE MODUS OPERANDI RELATED BY LOCATION ESTIMATION
1.5 EFFICIENCY OF LOCATION ESTIMATORS
1.6 ESTIMATION OF LOCATION AND SCALE
PROBLEMS
2 THE FUNCTIONAL APPROACH
2.1 ESTIMATION AND CONDITIONS A
2.2 CONSISTENCY
2.3 WEAK CONTINUITY AND WEAK CONVERGENCE
2.4 FRÉCHET DIFFERENTIABILITY
2.5 THE INFLUENCE FUNCTION
2.6 EFFICIENCY FOR MULTIVARIATE PARAMETERS
2.7 OTHER APPROACHES
PROBLEMS
3 MORE RESULTS ON DIFFERENTIABILITY
3.1 FURTHER RESULTS ON FRÉCHET DIFFERENTIABILITY
3.2 M‐ESTIMATORS: THEIR INTRODUCTION
3.3 REGRESSION M‐ESTIMATORS
3.4 STOCHASTIC FRéCHET EXPANSIONS AND FURTHER CONSIDERATIONS
3.5 LOCALLY UNIFORM FRéCHET EXPANSION
3.6 CONCLUDING REMARKS
4 MULTIPLE ROOTS
4.1 INTRODUCTION TO MULTIPLE ROOTS
4.2 ASYMPTOTICS FOR MULTIPLE ROOTS
4.3 CONSISTENCY IN THE FACE OF MULTIPLE ROOTS
5 DIFFERENTIABILITY AND BIAS REDUCTION
5.1 DIFFERENTIABILITY, BIAS REDUCTION, AND VARIANCE ESTIMATION
5.2 FURTHER RESULTS ON THE NEWTON ALGORITHM
PROBLEMS
6 MINIMUM DISTANCE ESTIMATION AND MIXTURE ESTIMATION
6.1 MINIMUM DISTANCE ESTIMATION AND REVISITING MIXTURE MODELING
6.2 THE
‐MINIMUM DISTANCE ESTIMATOR FOR MIXTURES
6.3 OTHER MINIMUM DISTANCE ESTIMATION APPLICATIONS
PROBLEMS
7 L‐ESTIMATES AND TRIMMED LIKELIHOOD ESTIMATES
7.1 A PREVIEW OF ESTIMATION USING ORDER STATISTICS
7.2 THE TRIMMED LIKELIHOOD ESTIMATOR
7.3 ADAPTIVE TRIMMED LIKELIHOOD AND IDENTIFICATION OF OUTLIERS
7.4 ADAPTIVE TRIMMED LIKELIHOOD IN REGRESSION
7.5 WHAT TO DO IF
n
IS LARGE?
PROBLEMS
8 TRIMMED LIKELIHOOD FOR MULTIVARIATE DATA
8.1 IDENTIFICATION OF MULTIVARIATE OUTLIERS
PROBLEMS
9 FURTHER DIRECTIONS AND CONCLUSION
9.1 A WAY FORWARD
PROBLEM
APPENDIX A: SPECIFIC PROOF OF THEOREM 2.1
APPENDIX B: SPECIFIC CALCULATIONS IN EXAMPLES 4.1 AND 4.2
APPENDIX C: CALCULATION OF MOMENTS IN EXAMPLE 4.2
BIBLIOGRAPHY
INDEX
END USER LICENSE AGREEMENT
Chapter 4
TABLE 4.1 Power Calculations of the Tests
TABLE 4.2 Global Empirical Size Calculations of the Test using
Chapter 5
TABLE 5.1 First‐Order Bias Term, with Associated Asymptotic Variances of Location and Scale, for Huber's Proposal 2 at the Standard Normal Distribution
Chapter 6
TABLE 6.1 Diameters at Different Concentrations
TABLE 6.2 Parameters and Means of Sample Estimates
TABLE 6.3 Root Mean Squared Error
TABLE 6.4 Root Mean Squared Error for a 1% Contaminated Model:
. Here
Chapter 7
TABLE 7.1 Modified Data on Wood‐Specific Gravity
TABLE 7.2 ATLA Analysis on Modified Data on Wood‐Specific Gravity Including a Constant
Chapter 1
FIGURE 1.1 Illustration of the SLLN in action by plotting the average of Bernoulli random variables with probability
as
number of tosses becomes large.
FIGURE 1.2 Illustration of the Tukey bisquare psi function.
FIGURE 1.3 Illustration of SLLN for the family of functions involving the Tukey bisquare; sample size
. Plots are for 10 independent samples.
FIGURE 1.4 Illustration of SLLN for the family of functions involving the Tukey bisquare; sample size
. Plots are for 10 independent samples.
FIGURE 1.5 Illustration of SLLN for the family of functions involving the Tukey bisquare; sample size
. Plots are for 10 independent samples.
FIGURE 1.6 Illustration of SLLN for the family of functions involving the Tukey bisquare; sample size
. Plots are for 10 independent samples.
FIGURE 1.7 Illustration of SLLN for the family of functions involving the derivative of the Tukey bisquare; sample size
. Plots are for 10 independent samples.
FIGURE 1.8 Histogram of the Tukey bisquare psi function estimates.
FIGURE 1.9 Estimated relative efficiency of the Tukey bisquare with tuning constant
relative to the sample mean based on 10 000 samples of size
, with
Newton–Raphson iterations beginning from the median.
FIGURE 1.10 Plot of Hampel
redescender for location.
FIGURE 1.11 Plot of
redescender for scale.
FIGURE 1.12 Plot of Huber's
‐function for location.
FIGURE 1.13 Plot of Huber's
‐function for estimating scale.
FIGURE 1.14 Plot of Bachmaier's
‐function for estimating location.
FIGURE 1.15 Plot of Bachmaier's
‐function for estimating scale.
Chapter 2
FIGURE 2.1
, …,
for an imaginary selection functional.
Chapter 4
FIGURE 4.1 Plot of the mooted selection functional of Clarke (1990) (CSF) and the selection functional of Gan and Jiang (1999)(GJSF) at the Cauchy model.
FIGURE 4.2 Plot of the mooted selection functional of Clarke (1990) (CSF) and the selection functional of Gan and Jiang (GJSF) for Example 4 of Gan and Jiang (1999).
Chapter 6
FIGURE 6.1 Example plot of the cumulative empirical distribution function generated from a cumulative normal distribution located at
. The smooth curve is
and
corresponds to the step function.
FIGURE 6.2 Illustration of boundedness of the
‐function for location the
‐minimum distance estimator in comparison to the unbounded function for the maximum likelihood estimator (MLE) and at the normal location model.
FIGURE 6.3 Side‐by‐side plots of histograms at each of the four concentration levels, 0.26, 0.51, 0.77, and 1.03 mol/l respectively.
FIGURE 6.4 Plot of switching regressions.
FIGURE 6.5 Five‐state model.
FIGURE 6.6 Three‐state model.
FIGURE 6.7 Sensitivity curves based on stylized samples (
= 50) with
= 2 and
= 0.5.
FIGURE 6.8 Box and whisker plots of Alcoa representation indicator data.
FIGURE 6.9 Estimated fits over histograms of representation indicators for both the Mahalanobis distances and the residual ratios of data sets 1–3.
Chapter 7
FIGURE 7.1 Histogram of raw gold assay and
–
plot of raw gold assay data followed by histogram of full set of logged gold assay data and then a histogram of logged gold assay data with fitted line plot of normal density for data with outlier removed.
Chapter 9
FIGURE 9.1 These plots are formed from 10 000 bivariate data supplied by Norm Campbell, 5000 points from each of two populations. A plot of bivariate data with the least squares fitted line on the data after the multivariate ATLA algorithm of Clarke and Schubert (2006) is applied. A plot of the potential outliers identified by the multivariate ATLA and the plot of the least squares line with the observations retained by ATLA. The final plot shows that the MM‐algorithm fitted regression dashed line applied with default parameters in version 3.2.2 of R.
Cover
Table of Contents
Begin Reading
ii
iii
iv
v
xi
xii
xiii
xiv
xv
xvi
xvii
xviii
xix
xx
xxi
xxiii
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
181
182
183
184
185
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
211
212
213
214
215
WILEY SERIES IN PROBABILITY AND STATISTICSEstablished by Walter A. Shewhart and Samuel S. Wilks
Editors: David J. Balding, Noel A. C. Cressie, Garrett M. Fitzmaurice, Geof H. Givens, Harvey Goldstein, Geert Molenberghs, David W. Scott, Adrian F. M. Smith, Ruey S. Tsay
Editors Emeriti: J. Stuart Hunter, Iain M. Johnstone, Joseph B. Kadane, Jozef L. Teugels
The Wiley Series in Probability and Statistics is well established and authoritative. It covers many topics of current research interest in both pure and applied statistics and probability theory. Written by leading statisticians and institutions, the titles span both state‐of‐the‐art developments in the field and classical methods.
Reflecting the wide range of current research in statistics, the series encompasses applied, methodological and theoretical statistics, ranging from applications and new techniques made possible by advances in computerized practice to rigorous treatment of theoretical approaches.
This series provides essential and invaluable reading for all statisticians, whether in academia, industry, government, or research.
A complete list of titles in this series can be found athttp://www.wiley.com/go/wsps
BRENTON R. CLARKE
Murdoch University
This edition first published 2018© 2018 John Wiley & Sons, Inc.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.
The right of Brenton R Clarke to be identified as the author of this work has been asserted in accordance with law.
Registered OfficesJohn Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA
Editorial Office111 River Street, Hoboken, NJ 07030, USA
For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.
Wiley also publishes its books in a variety of electronic formats and by print-on-demand. Some content that appears in standard print versions of this book may not be available in other formats.
Limit of Liability/Disclaimer of WarrantyThe publisher and the authors make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties; including without limitation any implied warranties of fitness for a particular purpose. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for every situation. In view of on-going research, equipment modifications, changes in governmental regulations, and the constant flow of information relating to the use of experimental reagents, equipment, and devices, the reader is urged to review and evaluate the information provided in the package insert or instructions for each chemical, piece of equipment, reagent, or device for, among other things, any changes in the instructions or indication of usage and for added warnings and precautions. The fact that an organization or website is referred to in this work as a citation and/or potential source of further information does not mean that the author or the publisher endorses the information the organization or website may provide or recommendations it may make. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this works was written and when it is read. No warranty may be created or extended by any promotional statements for this work. Neither the publisher nor the author shall be liable for any damages arising here from.
Library of Congress Cataloging-in-Publication Data
Names: Clarke, Brenton R., authorTitle: Robustness theory and application / by Brenton R. Clarke.Description: Hoboken, NJ : John Wiley & Sons, 2018. | Series: Wiley series in probability and statistics | Includes bibliographical references and index. |Identifiers: LCCN 2018007658 (print) | LCCN 2018010629 (ebook) | ISBN 9781118669501 (pdf) | ISBN 9781118669372 (epub) | ISBN 9781118669303 (cloth)Subjects: LCSH: Robust statistics.Classification: LCC QA276.A2 (ebook) | LCC QA276.A2 C53 2018 (print) | DDC 519.5–dc23LC record available at https://lccn.loc.gov/2018007658
Cover Design: WileyCover Image: © lvcandy/GettyImages
To my darling wife Erica and much loved sons Andrew and Stephen
It could be said that the genesis of this book came out of a unit which was on robust statistics and taught by Noel Cressie in 1976 at Flinders University of South Australia. Noel's materials for the lectures were gathered from Princeton University where he had just completed his PhD. Having been introduced to M‐, L‐, and R‐estimators I shifted to the Australian National University in 1977 to work on the staff at the Statistics Department in the Faculties. There I enrolled part time in a PhD with Professor C. R. Heathcote (affectionately known as Chip by his colleagues and family) who happened to be researching the integrated squared error method of estimation and more generally the method of minimum distance. The common link between the areas of study, robust statistics and minimum distance estimation, was that of M‐ estimation. Some minimum distance estimation methods can be represented by M‐estimators. A typical model used in the formulation of robustness studies was the “epsilon‐contaminated normal distribution.” In the spirit of John W. Tukey from Princeton University the relative performance of the estimator, usually of location, was to consider it in such contaminated models. It occurred to me that one could also estimate the proportion of contamination, epsilon, in such models and when I proposed this to Chip he became enthusiastic that I should work on these mixture models for estimation in my PhD. Chip was aware that the trend for PhDs was to have a motivating set of data and to this end he introduced me to recently acquired earthquake data recordings which could be modeled with mixture modeling. A portion of a large data set was passed on to me by Professor R.S. Anderssen (known as Bob), also at the Australian National University. Bob also introduced me to the Fortran Computing Language. My brief was to compute minimum distance estimators on the earthquake data. In the mean time, Chip introduced me to Professor Frank Hampel's PhD thesis and several references on mixture modeling. After 1 year of trying to compute variance covariance matrices for the minimum distance estimation methods and for some reason failing to get positive definite matrices as was expected, I decided to come back to M‐estimation and study the theory more closely. An idea germinated that I could study the M‐estimator at a distribution other than at the model parametric family and other than at a symmetric contaminating distributions. This became the inspiration for my own PhD work.
I had the good fortune to then cross paths with Peter Hall. Chip who had been burdened with the duties as Dean of the Faculty of Economics at ANU took sabbatical at Berkeley for a year and Peter became my supervisor. Peter was always so cheerful and encouraging when it came to research. He was publishing a book on the “Martingale limit theory and its application” with Chris Heyde, and he encouraged me to read books and papers on limit theorems. I thus became interested in the calculus associated with limit theorems, and asymptotic theory of M‐estimators. Chip returned to ANU in 1980 and kindly advised me on the presentation of my thesis and arranged for three quality referees, one of whom was Noel Cressie!
For some reason I wanted to go overseas and see the world. This was made possible with a postdoctoral research assistant position to study time series analysis at Royal Holloway College, University of London, in the period 1980–1982. While I worked on time series, I took my free time to put together my first major publication. Huber's (1981) monograph had come out. My paper was to illustrate that for a large class of statistics that could be represented by statistical functionals, which were in fact M‐estimators, it was possible to inherit both weak continuity and Fréchet differentiability. These qualities in turn provide inherent robustness of the statistics. From the time of first submission to actual publication in The Annals of Statistics it took approximately 2.5 years to see it come out. It was during this time of waiting that I traveled to Zürich after writing to Professor Hampel. He was keen to see my work published as it supported with rigor notions which he had put forward in a heuristic manner, vis‐à‐vis the influence function. Subsequently, I spent almost a year at the Swiss Federal Institute of Technology (ETH), working as a research assistant and tutoring a class on the analysis of variance class lectured by Professor Hampel.
The Conditions A and discussion that are given in Chapter 2 of this book are from that Annals of Statistics paper. To facilitate the theory of weak continuity and Fréchet differentiability, I initially had to make smoothness assumptions on the defining ψ‐functions for the M‐estimators. It was not until I traveled to the University of North Carolina at Chapel Hill where I picked up the newly published book by Frank H. Clarke on “Optimization and Nons‐ mooth Analysis” that I realized how proofs of weak continuity and Fréchet differentiability for M‐estimators with nonsmooth psi‐functions, or psi‐functions which were bounded and continuous but had “sharp corners,” could follow through. I subsequently wrote a paper from Murdoch University where I had taken up in 1984 a newly appointed lecturing position in the then Mathematics Department. The paper was eventually published in 1986 in Probability Theory and Related Fields. This book brings together both these papers and a paper on what are called selection functionals.
My sojourn at Murdoch University has been one of teaching and research. I benefited from many years of teaching in service course and undergraduate mathematics and statistics units, having developed materials for a unit on “Linear Models” which later became a Wiley publication in 2008. I have also developed a unit on “Time Series Analysis” and have two PhD students write theses in that general area. These forays while time consuming have helped me understand statistics a lot better. It has to be said that to teach robust statistics properly one needs to understand the mathematics that comes with it. Essentially, my experience in robust statistics has been one coming out of mathematics departments or at least statistics groups heavily supported by mathematics departments. But from the mathematics comes understanding and eventually new ideas on how to analyze data and further appreciation of why some things work and others do not. This book is a reflection of that general summary.
In writing this book I have also alluded to or summarized many works that have been collaborative. An author with whom I have published much is Pro‐ fessor Tadeusz Bednarski from Poland. I met Professor Bednarski at a Robust Statistics Meeting in Oberwolfach, Germany, in 1984. He recognized the importance of Fréchet differentiability and in particular the two works mentioned earlier and we proceeded to make a number of joint works on the asymptotics of estimators. He spoke on Fréchet differentiability at the Australian Statistics Conference 2010 held in Fremantle in Western Australia. However, with the tyranny of distance and our paths diverging since then it is clear that this book could not be written collaboratively. However, I owe much to the joint research that we did as is acknowledged in the book.
I have also benefited from collaborative works with many other authors. These works have helped in the presentation of new material in the book. In 1993 I published a paper with Robin Milne and Geoffrey Yeo in the Scandinavian Journal of Statistics. I thank both Robin and Geoff for making me think about the asymptotic theory when there are multiple solutions to ones estimat‐ ing equations. There are subsequently new examples and results on asymptotic properties of roots and tests in Chapter 4 of this book that have been developed by the author. In 2004 the author published a paper with former honors student Chris Milne in the Australian & New Zealand Journal of Statistics on a small sample bias correction to Huber's Proposal 2 estimator of location and scale and followed this with a paper in 2013 at the ISI meeting in Hong Kong. Summary results are included with permission in Section 5.1.2. In 2006 I collaborated with Andreas Futschik from the University of Vienna, to study the properties of the Newton Algorithm when dealing with either M‐estimation or density estimation and a new Theorem 5.1 is borne out of that work. My interest in minimum distance estimation and its applications are summarized in Chapter 6. These include references to work with Chip Heathcote and also other collaborators such as Peter McKinnon and Geoff Riley. A new theorem on the unbiased nature of the vector parameter estimator of proportions given all other location and scale parameters in a mixture of normal distributions are known is given in Theorem 10.1. In addition plots in Figures 2.1, 6.6, 6.7, 6.8, and 6.9 are reproduced with acknowledgment from their source.
No book on robustness is complete without the study of L‐estimators or estimation of linear combinations of order statistics. I have only attempted to introduce the ideas which lead on to natural extensions on to least trimmed squares and generalizations to trimmed likelihood and adaptive trimmed likeli‐ hood algorithms. I have found these useful for identifying outliers where there are outliers to be found, yet caution the reader to use Fréchet differentiable es‐ timators for robust statistical inference. The outlier detection methods depart from the general use of Cooks distance in regression estimation yet have the appealing feature that they work even when there are what are termed points of influence.
The book does not canvas robust methods in time series or robust survival analysis, though references are given. Maronna et al. (2006) book is a useful starting point for robust time series. Developments on robust survival analysis continue to accrue. The presentation of this book is not exhaustive and many areas of endeavor in robust statistics are not countenanced in this book. The book mainly is a link between many areas of research that the author has been personally involved with in some way and attempts to weave the essence of relevant theory and application.
The work would never have been possible without the introduction of Fréchet differentiability into the statistical literature by Professor C. R. Rao and Professor G. Kallianpur in their ground‐breaking paper in 1955. We have much to remember the French mathematician Maurice Renáe Fréchet for as well as Sir Ronald Aylmer Fisher who helped to motivate the 1955 paper.
This book requires a strong mathematics background as a prerequisite in or‐ der to be read in its entirety. Students and researchers will then appreciate the generally easily read Chapter 1 in the Introduction. Chapters 2 and 3 require a mathematics background, but it is possible to avoid the difficult proofs requiring the knowledge of the inverse function theorem in the proofs of weak continuity and Frèchet differentiability of M‐functionals, should you need to gloss over the mathematics. On the other hand, great understanding can be gleaned from paying attention to such proofs. There are references later in the book to other important theorems such as the fixed point theorem and the implicit function theorem, though these are only referred to, and keen students may chase up their statements and proofs in related papers and mathematics texts. In this book Chapter 4 is important to the understanding that there can be more than one root to one's estimating equations and gives new results in this direction. Chapters 5–9 include applications and vary from the simple applications of computing robust estimators to the asymptotic theory descriptions which are a composite of exact calculation, such as in the theory of L2 estimation of proportions, or descriptions of asymptotic normality results that can be further gleaned by studying the research papers cited. The attempt is to bring together works on estimation theory in robust estimation. I leave it to others to consider the potentially more difficult theory of testing, albeit robust confidence intervals based on asymptotics are a starting point for such.
This book has been written in what may be the last decade of my working career. Hopefully, others may benefit from the insights that this compendium of knowledge, which covers much research into robustness that I have had a part to play with, gives.
7 December 2017
BRENTON R. CLARKEPerth, Western Australia
I wish to acknowledge my two PhD supervisors Chip Heathcote and Professor Peter G. Hall. Both passed away in 2016. I remember them for their generous guidance in motivating my PhD studies in the period 1977–1980. Also I have to thank Professor Frank R. Hampel and his colleagues and students for helping me on my way during postdoctoral training as a Wissenschaftliche Mitarbeiter at ETH in 1982–1983. Their influence is unmistakeable. I owe much to the late Professor Alex Roberston for his help in bringing me to the then Mathematics Department, now called Mathematics and Statistics at Murdoch University, and I thank all my mathematics and statistics colleagues past and present for their generosity in allowing me to teach and research while at Murdoch University.
To my collaborators mentioned in the Foreword I give my thanks. Special thanks are to Professor Tadeusz Bednarski for fostering international collab‐ oration in mathematics and statistics pre and post Communism (in Poland) and showing that there are no international borders religious or political in the common language of mathematics. I also thank Professor Andrzej Kozek who first hosted me at the University of Wroclaw in Poland and introduced me to Professor Bednarski.
Other researchers with whom I have the pleasure of being able to work with include Thomas Davidson, Andreas Futschik, David Gamble, Robert Ham‐ marstrand, Toby Lewis, Peter McKinnon, Chris Milne, Robin Milne, Geoffrey Yeo, and Geoff Riley just to name some. More recent collaborations are with Christine Mueller and students. Robert Hammarstrand has also contributed by working under my direction to polish some of the R‐algorithms associated with this book, for which I take full responsibility.
I have to acknowledge the work with Daniel Schubert. He wrote and gained his PhD under my supervision on the area of trimmed likelihood, but after gaining a position in CSIRO had his life cut short in a motor bike accident in 2007. I remember his eccentricities and for his enthusiasm for his newly found passion of robust statistics when he was a student. I include in the suite of associated R‐algorithms our contribution to the Adaptive Trimmed Likelihood Algorithm for multivariate data.
As I came nearer the publication due date in July 2016, I had the privilege of visiting the Department of Statistics at The University of Glasgow headed by Professor Adrian Bowman. In August 2016 I visited Professor Mette Langaas and colleagues in the Statistics Group in the Mathematics Department at the Norwegian University of Science and Technology (NTNU). Some of this book was inspired by these visits. The journey was also facilitated by a visit to the University of Surrey, where I was hosted by Dr. Janet Godolphin in the Department of Mathematics. Finally, I acknowledge Murdoch University for the sabbatical that was taken nominally at the University of Western Australia, for the remainder of second semester 2016 used in preparation of this book. I thank Berwin Turlach at the University of Western Australia for his administrative role in arranging this. In addition I would like to thank Professor Luke Prendergast for encouragement and comments on a penultimate version of the book. Also thanks to Professors Alan Welsh and Andrew Wood for encouragement.
It goes without saying that I owe much to my wife and children in the formulation of this book and its predecessor Linear Models: The Theory and Application of Analysis of Variance which was published in 2008. They are duly acknowledged. Subsequently, I dedicate both books to them.
BRENTON R. CLARKE
∈
element of
∉
not an element of
A
∩
B
intersection of sets
A
and
B
A
∪
B
union of sets
A
and or
B
A
⊂
B
A
is contained in
B
observation space
R
separable metrizeable space
ℰ
real line
positive real line
ℰ
r
Euclidean
r
‐space
space of distribution functions
closed delta neighborhood of set
A
parameter space
E
[
X
]
expected value of the random variable
X
(
z
)
standard normal density function
Φ(
z
)
cumulative normal distribution
ℐ(θ)
Fisher information
|
|
modulus or absolute value of
convergence in probability
convergence in distribution
convergence almost surely
⇒
converges weakly
implies
∀
for every
||
||
for vectors and matrices it is the Euclidean norm
for elements on a normed linear space it is the norm
N
(
μ
,
σ
2
)
univariate normal random variable with mean
μ
and variance
σ
2
N
(
μ
,
Σ
)
multivariate normal random variable
with mean
μ
and covariance matrix
Σ
x
1
+
x
2
+ … +
x
n
ATLA
adaptive trimmed likelihood algorithm
CLT
central limit theorem
EM
expectation maximization
LLN
law of large numbers
MAD
median absolute deviation
MADN
normalized median absolute deviation
MCVM
minimum Cramèr–von Mises estimator
MLE
maximum likelihood estimator
SLLN
strong law of large numbers
TLE
trimmed likelihood estimator
WLLN
weak law of large numbers
This book is accompanied by a companion website:
www.wiley.com/go/clarke/robustnesstheoryandapplication
BCS Ancillary Website contains:
several computing programs, some in R and some in MatLab.
The suite of routines were developed by Brenton and in some cases in conjunction with other students and collaborators of Brenton.
No responsibility lies with Brenton or any others mentioned for the use of the routines given in endeavours outside the book.
BRENTON R. CLARKENovember 2017
The first and major proportion of this book is concerned with both asymptotic theory and empirical evidence for the convergence of estimators. The author has contributed here in more than just one article. Mostly, the relevant contributions have been in the field of M‐estimators, and it is the purpose of this book to make some ideas that have necessarily been couched in deep theory of continuity and also differentiability of functions more easily accessible and understandable. We do this by illustration of convergence concepts which begin with the strong law of large numbers (SLLN) and then eventually make use of the central limit theorem (CLT) in its most basic form. These two results are central to providing a straightforward discussion of M‐estimation theory and its applications. The aim is not to give the most general results on the theory of M‐estimation nor the related L‐estimation discussed in the latter half of the book, for these have been adequately displayed in other books including Jurečková et al. (2012), Maronna et al. (2006), Hampel et al. (1986), Huber and Ronchetti (2009), and, Staudte and Sheather (1990). Rather, motivation for the results of consistency and asymptotic normality of M‐estimators is explained in a way which highlights what to do when one has more than one root to the estimating equations. We should not shy away from this problem since it tends to be a recurrent theme whenever models become quite complex having multiple parameters and consequently multiple simultaneous potentially nonlinear equations to solve.
We begin with some basic terminology in order to set the scene for the study of convergence concepts which we then apply to the study of estimating equations and also loss functions. So we assume that there is an “Observation Space” denoted by , which can be a subset of a separable metric space . In published papers by the author, it was assumed typically that was a separable metric space, which did not allow for discrete data observed on, say, the nonnegative integers (such as Poisson distributed data). However, it is enough to consider now since the arguments follow through easily enough. So the generality of the discussion includes data that are either continuous or discrete, and either univariate or multivariate, or say defined on the positive real line such as in lifetime data.
For instance, if the data are ‐dimensional continuous multivariate data, we have that . Then we let be the smallest ‐field containing the class of open sets on generated by the metric on . These are called the ‐dimensional Borel sets, and is known as the Borel ‐field. See Problem 1.1 for the definition of a ‐field.
A distribution on is a nonnegative and countably additive set function, , on , for which , and it is well known say that on there corresponds a unique right continuous function whose limits are zero and one at and , defined by .
We shall denote an abstract probability space to be . This formulation assumes to be a σ‐field of subsets of , with a probability measure on . is thought of as a sample space and elements of , denoted by , are the outcomes. Then a sequence of random variables on is defined via
taking values in the infinite product space . The observed sample of size is then written as
while the nth random variable is given by . Both and are then what are termed measurable maps with respect to . They induce distributions and on and , respectively. A useful reference on equivalent representations of infinite sequences of random variables and probability measures is that of Chung (2001), (now 3rd ed.) (1st ed., pp. 54–58).
We use the symbol to denote the space of distributions on . Consider an arbitrary set . Denote by the product measure on that gives
Then we say that the sequence is independent identically distributed (i.i.d.) if there exists a such that for every in the form above
The law of large numbers (LLN) is a result that describes what will happen if we repeat an experiment a large number of times. To couch it in simple terms, it is when the average of the results/experiments obtained from a large number of trials should be close to its expected value. How we measure closeness and relate that to a limit of the number of trials, tending to infinity say, requires us to introduce two possible modes of convergence of random variables.
For a sequence of random variables on , we can define two modes of convergence to a random variable on that same probability space. The first and weaker mode of convergence, convergence in probability, is defined as follows: Let be the metric distance on . Then we say, provided the random variables are defined on the same probability space, converges in probability to if for every
We say
Should be a constant and say , then say the definition can be modified to say if for every
A stronger form of convergence of say to the random variable is convergence with probability one or almost sure convergence
It is a fact that this form of convergence implies convergence in probability. Hence, statements of almost sure convergence can be preferred to statements made only in probability.
In order to relate the LLN, we need a formal concept of the expected value of a random variable. For a random variable on , its expected value is written , where is the induced distribution function on the full space that contains the observation space . Thus, for observations on a subset of the real line, that is where we have then, denoting ,
This is the mean value of the random variable . Similarly, we may define the variance assuming it exists, via
As is typically done, we denote the variance . The variance measures the spread of the observations about the expected value. Since the variance is involving squared deviations from the mean, it is not in the same units of measure as the original variable. Hence, what is more often used as a measure of spread is the standard deviation which is the square root of the variance, that is . Nevertheless, the variance plays an important part in convergence concepts and indeed makes proofs, even of the LLN in its weaker form, much easier when it is known to exist, that is when it is finite.
The LLN in either its weaker or stronger form state that – with almost certainty ‐ the sample average
converges to the expected value (assumed finite) so that
Here is assumed i.i.d. and consequently . It is not necessary that
