Robust Correlation - Georgy L. Shevlyakov - E-Book

Robust Correlation E-Book

Georgy L. Shevlyakov

0,0
75,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

This bookpresents material on both the analysis of the classical concepts of correlation and on the development of their robust versions, as well as discussing the related concepts of correlation matrices, partial correlation, canonical correlation, rank correlations, with the corresponding robust and non-robust estimation procedures. Every chapter contains a set of examples with simulated and real-life data. Key features: * Makes modern and robust correlation methods readily available and understandable to practitioners, specialists, and consultants working in various fields. * Focuses on implementation of methodology and application of robust correlation with R. * Introduces the main approaches in robust statistics, such as Huber's minimax approach and Hampel's approach based on influence functions. * Explores various robust estimates of the correlation coefficient including the minimax variance and bias estimates as well as the most B- and V-robust estimates. * Contains applications of robust correlation methods to exploratory data analysis, multivariate statistics, statistics of time series, and to real-life data. * Includes an accompanying website featuring computer code and datasets * Features exercises and examples throughout the text using both small and large data sets. Theoretical and applied statisticians, specialists in multivariate statistics, robust statistics, robust time series analysis, data analysis and signal processing will benefit from this book. Practitioners who use correlation based methods in their work as well as postgraduate students in statistics will also find this book useful.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 426

Veröffentlichungsjahr: 2016

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

Wiley Series in Probability and Statistics

Title Page

Copyright

Dedication

Preface

Acknowledgements

About the companion website

Chapter 1: Introduction

1.1 Historical Remarks

1.2 Ontological Remarks

References

Chapter 2: Classical Measures of Correlation

2.1 Preliminaries

2.2 Pearson's Correlation Coefficient: Definitions and Interpretations

2.3 Nonparametric Measures of Correlation

2.4 Informational Measures of Correlation

2.5 Summary

References

Chapter 3: Robust Estimation of Location

3.1 Preliminaries

3.2 Huber's Minimax Approach

3.3 Hampel's Approach Based on Influence Functions

3.4 Robust Estimation of Location: A Sequel

3.5 Stable Estimation

3.6 Robustness Versus Gaussianity

3.7 Summary

References

Chapter 4: Robust Estimation of Scale

4.1 Preliminaries

4.2 - and -Estimates of Scale

4.3 Huber Minimax Variance Estimates of Scale

4.4 Highly Efficient Robust Estimates of Scale

4.5 Monte Carlo Experiment

4.6 Summary

References

Chapter 5: Robust Estimation of Correlation Coefficients

5.1 Preliminaries

5.2 Main Groups of Robust Estimates of the Correlation Coefficient

5.3 Asymptotic Properties of the Classical Estimates of the Correlation Coefficient

5.4 Asymptotic Properties of Nonparametric Estimates of Correlation

5.5 Bivariate Independent Component Distributions

5.6 Robust Estimates of the Correlation Coefficient Based on Principal Component Variances

5.7 Robust Minimax Bias and Variance Estimates of the Correlation Coefficient

5.8 Robust Correlation via Highly Efficient Robust Estimates of Scale

5.9 Robust -Estimates of the Correlation Coefficient in Independent Component Distribution Models

5.10 Monte Carlo Performance Evaluation

5.11 Robust Stable Radical -Estimate of the Correlation Coefficient of the Bivariate Normal Distribution

5.12 Summary

References

Chapter 6: Classical Measures of Multivariate Correlation

6.1 Preliminaries

6.2 Covariance Matrix and Correlation Matrix

6.3 Sample Mean Vector and Sample Covariance Matrix

6.4 Families of Multivariate Distributions

6.5 Asymptotic Behavior of Sample Covariance Matrix and Sample Correlation Matrix

6.6 First Uses of Covariance and Correlation Matrices

6.7 Working with the Covariance Matrix–Principal Component Analysis

6.8 Working with Correlations–Canonical Correlation Analysis

6.9 Conditionally Uncorrelated Components

6.10 Summary

References

Chapter 7: Robust Estimation of Scatter and Correlation Matrices

7.1 Preliminaries

7.2 Multivariate Location and Scatter Functionals

7.3 Influence Functions and Asymptotics

7.4 M-functionals for Location and Scatter

7.5 Breakdown Point

7.6 Use of Robust Scatter Matrices

7.7 Further Uses of Location and Scatter Functionals

7.8 Summary

References

Chapter 8: Nonparametric Measures of Multivariate Correlation

8.1 Preliminaries

8.2 Univariate Signs and Ranks

8.3 Marginal Signs and Ranks

8.4 Spatial Signs and Ranks

8.5 Affine Equivariant Signs and Ranks

8.6 Summary

References

Chapter 9: Applications to Exploratory Data Analysis: Detection of Outliers

9.1 Preliminaries

9.2 State of the Art

9.3 Problem Setting

9.4 A New Measure of Outlier Detection Performance

9.5 Robust Versions of the Tukey Boxplot with Their Application to Detection of Outliers

9.6 Robust Bivariate Boxplots and Their Performance Evaluation

9.7 Summary

References

Chapter 10: Applications to Time Series Analysis: Robust Spectrum Estimation

10.1 Preliminaries

10.2 Classical Estimation of a Power Spectrum

10.3 Robust Estimation of a Power Spectrum

10.4 Performance Evaluation

10.5 Summary

References

Chapter 11: Applications to Signal Processing: Robust Detection

11.1 Preliminaries

11.2 Robust Minimax Detection Based on a Distance Rule

11.3 Robust Detection of a Weak Signal with Redescending -Estimates

11.4 A Unified Neyman–Pearson Detection of Weak Signals in a Fusion Model with Fading Channels and Non-Gaussian Noises

11.5 Summary

References

Chapter 12: Final Remarks

12.1 Points of Growth: Open Problems in Multivariate Statistics

12.2 Points of Growth: Open Problems in Applications

Wiley Series in Probability and Statistics

Index

WILEY SERIES IN PROBABILITY AND STATISTICS

End User License Agreement

Pages

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

202

203

203

204

204

205

205

206

206

207

207

208

208

209

209

210

210

211

211

212

212

213

213

214

214

215

215

216

216

217

217

218

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

308

309

310

311

312

Guide

Cover

Table of Contents

Begin Reading

List of Illustrations

Chapter 2: Classical Measures of Correlation

Figure 2.1

Data with positive correlation.

Figure 2.5

Linear dependent data correlation.

Figure 2.6

Linear dependent data correlation and determination coefficients.

Figure 2.9

Approximately nonlinear dependent data correlation and determination coefficients.

Figure 2.10

Ellipse of equal probability for the standard bivariate normal distribution with the major and minor diameters dependent on the correlation coefficient.

Figure 2.11

The sample correlation coefficient as the cosine of the angle between the variable vectors.

Figure 2.12

The quadrant correlation coefficient.

Chapter 3: Robust Estimation of Location

Figure 3.1

Figure 3.2 Tukey's sensitivity curve for the sample mean

Figure 3.3 Tukey's sensitivity curve for the sample median

Figure 3.4 Influence function for the population mean

Figure 3.5 Influence function for the population median

Figure 3.6 Hampel's score function

Figure 3.7 Huber's skipped mean score function

Figure 3.8 Mosteller-Tukey's biweight score function

Figure 3.9 Optimal score function for the class of nondegenerate distribution densities

Figure 3.10 Optimal score function for the class of distribution densities with a bounded variance

Figure 3.11 Optimal score function for the class of finite distribution densities

Figure 3.12 Optimal score function for the class of approximate finite distribution densities

Figure 3.13 Optimal score function for the class of nondegenerate distribution densities with a bounded variance in the intermediate between the classes and zone

Figure 3.14 Efficiency and stability of the estimates of location at the normal distribution with density

Figure 3.16 Efficiency and stability of the estimates of location at the Cauchy distribution with density

Figure 3.17 Standard Gaussian distribution density

Chapter 4: Robust Estimation of Scale

Figure 4.1 Score function for the standard deviation.

Figure 4.3 Score function for the median absolute deviation.

Figure 4.4 Score function for the minimax variance estimate of scale: the trimmed standard deviation.

Figure 4.5 The median absolute deviation score function at the standard normal density.

Figure 4.6 Efficiency of the estimate of scale at the standard normal.

Figure 4.7 Breakdown point of the estimate of scale.

Figure 4.8 Influence function of the estimate of scale.

Figure 4.9 Typical dependence of Monte Carlo accuracy on the number of trials.

Figure 4.10 Score function for the Huber -estimate of scale.

Figure 4.11 Standardized variance (axis ) versus average absolute bias (axis ) performance at the standard normal distribution: .

Figure 4.14 Average bias dependence on the contamination fraction at the contaminated normal distribution: .

Figure 4.12 Standardized variance (axis ) versus average absolute bias (axis ) performance at the standard normal distribution: .

Figure 4.13 Average bias dependence on the contamination fraction at the contaminated normal distribution: .

Chapter 5: Robust Estimation of Correlation Coefficients

Figure 5.1 Impact of outliers on the Pearson correlation coefficient

Figure 5.2 Asymptotic relative efficiencies of nonparametric correlation measures (axis ) versus the correlation coefficient (axis ) of the normal distribution

Chapter 9: Applications to Exploratory Data Analysis: Detection of Outliers

Figure 9.1 Tukey's univariate boxplot

Figure 9.2 -boxplot

Figure 9.3 Hypotheses testing under shift contamination

Figure 9.4 The relationship between the ROC curve and -mean

Figure 9.5 -bivariate boxplot

Figure 9.6 -bivariate boxplot realization: , , 4 suspicious observations

Figure 9.7 Legend

Figure 9.8 Shift contamination:

Figure 9.10 Shift contamination:

Figure 9.11 Scale contamination:

Figure 9.13 Scale contamination:

Figure 9.12

Figure 9.14 Ellipse deviation estimate

Figure 9.15 Ellipse shape deviations of the Tukey bagplot and -boxplot

Figure 9.16 The variances of location estimates: grey—-boxplot, black—Tukey's bagplot

Chapter 10: Applications to Time Series Analysis: Robust Spectrum Estimation

Figure 10.1 Median Fourier transform power spectrum estimate breakdown point property: the mixture of two sinusoids model with the and duration intervals.

Figure 10.3 Median Fourier transform power spectrum estimate breakdown point property: the median periodogram power spectrum estimate.

Figure 10.4 Power spectrum estimation with robust filter-cleaners: model with

AO

contamination; , (Spangl 2008).

Figure 10.7 Power spectra estimation: model with

AO

contamination; , .

Figure 10.8 Smoothed classical power spectrum estimation in the disorder model with contamination; , .

Figure 10.9 Smoothed robust power spectrum estimation in the disorder model with contamination; , .

Figure 10.6 Power spectra estimation by the Yule–Walker method: model with

AO

contamination; , .

Chapter 11: Applications to Signal Processing: Robust Detection

Figure 11.1 Error probability in the Gaussian noise: , .

Figure 11.3 Error probability in the generalized Gaussian noise close to uniform: asymptotics, , .

Figure 11.2 Error probability in the contaminated Gaussian noise: asymptotics, , .

Figure 11.4 Probability of missing in the Gaussian noise: , , .

Figure 11.6 Probability of missing in the Gaussian noise: , .

Figure 11.7 The parallel fusion model with the sensor nodes and fusion center.

Figure 11.8 ROC curves for detection in the Gaussian noise at SNR = 0, 10, 15, and 20 dB with =100.

Figure 11.9 ROC curves for detection in the Cauchy noise at GSNR = 0, 10, 20, and 30 dB with =100.

Figure 11.10 ROC curves for detection in the Laplace noise at SNR = 0, 10, and 20 dB with =100.

List of Tables

Chapter 2: Classical Measures of Correlation

Table 2.1 Pearson's correlation between a random observation and its sign

Table 2.2 Pearson's correlation between a random observation and its rank

Chapter 3: Robust Estimation of Location

Table 3.1 Efficiency and stability of -estimates of location

Chapter 4: Robust Estimation of Scale

Table 4.1 Computation time in microseconds

Table 4.5 Monte Carlo means and standardized variances in the Cauchy distribution model

Table 4.2 Monte Carlo means and standardized variances in the scale-only standard normal distribution model

Table 4.3 Monte Carlo means and standardized variances in the location-scale standard normal distribution model

Table 4.4 Monte Carlo means and standardized variances in the Tukey gross error distribution model (, )

Chapter 5: Robust Estimation of Correlation Coefficients

Table 5.1 Normal distribution :

Table 5.8 Bivariate Cauchy -distribution :

Table 5.2 Normal distribution :

Table 5.3 Contaminated normal distribution : , , ,

Table 5.4 Contaminated normal : , , ,

Table 5.6 ICD Cauchy distribution :

Chapter 9: Applications to Exploratory Data Analysis: Detection of Outliers

Table 9.1 Boundaries of the power and false alarm rate

Table 9.2 -means for detection tests under scale contamination: ,

Table 9.3 -means for detection tests under shift contamination: ,

Table 9.4 -means for detection tests under shift contamination with the different values of :

Table 9.5 -means for boxplot tests applied to server data

Table 9.6 Types of contaminated Gaussian distribution densities

Chapter 11: Applications to Signal Processing: Robust Detection

Table 11.1 The factor in (11.13) for various noises

Table 11.2 Detection efficiency and stability for various detectors and noise distributions (the best detector performances for each noise distribution are boldfaced except the maximum likelihood and the minimum error sensitivity cases)

Table 11.3 The factor for various detectors and noise distributions (the best detector performances for each noise distribution are boldfaced except the maximum likelihood case)

WILEY SERIES IN PROBABILITY AND STATISTICS

Established by WALTER A. SHEWHART and SAMUEL S. WILKS

Editors: David J. Balding, Noel A. C. Cressie, Garrett M. Fitzmaurice Geof H. Givens, Harvey Goldstein, Geert Molenberghs, David W. Scott Adrian F. M. Smith, Ruey S. Tsay, Sanford Weisberg

Editors Emeriti: J. Stuart Hunter, Iain M. Johnstone, Joseph B. Kadane Jozef L. Teugels

A complete list of the titles in this series appears at the end of this volume.

Robust Correlation

Theory and Applications

 

 

Georgy L. Shevlyakov

Peter the Great Saint-Petersburg Polytechnic University, Russia

 

 

Hannu Oja

University of Turku, Finland

 

 

This edition first published 2016 © 2016 by John Wiley and Sons Ltd

Registered office

John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom

For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com.

The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.

Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. It is sold on the understanding that the publisher is not engaged in rendering professional services and neither the publisher nor the author shall be liable for damages arising herefrom. If professional advice or other expert assistance is required, the services of a competent professional should be sought.

Library of Congress Cataloging-in-Publication Data

Names: Shevlyakov, Georgy L. | Oja, Hannu.

Title: Robust correlation : theory and applications / Georgy L. Shevlyakov, Peter The Great Saint-Petersburg Polytechnic University, Russia, Hannu Oja, University Of Turku, Finland.

Description: Chichester, West Sussex : John Wiley & Sons, Inc., 2016. | Series: Wiley series in probability and statistics | Includes bibliographical references and index.

Identifiers: LCCN 2016017308 (print) | LCCN 2016020693 (ebook) | ISBN 9781118493458 (cloth) | ISBN 9781119264538 (pdf) | ISBN 9781119264491 (epub)

Subjects: LCSH: Correlation (Statistics) | Mathematical statistics.

Classification: LCC QA278.2 .S4975 2016 (print) | LCC QA278.2 (ebook) | DDC

519.5/37–dc23

LC record available at https://lccn.loc.gov/2016017308

A catalogue record for this book is available from the British Library.

To our families

Preface

Robust statistics as a branch of mathematical statistics appeared due to the seminal works of John W. Tukey (1960), Peter J. Huber (1964), and Frank R. Hampel (1968). It has been intensively developed since the sixties of the last century and is definitely formed by the present. The term “robust” (Latin: strong, sturdy, tough, vigorous) as applied to statistical procedures was proposed by George E.P. Box (1953).

The principal reason for research in this field of statistics is of a general mathematical nature. Optimality (accuracy) and stability (reliability) are the mutually complementary characteristics of many mathematical procedures. It is well-known that the performance of optimal procedures is, as a rule, rather sensitive to “small” perturbations of prior assumptions. In mathematical statistics, the classical example of such an unstable optimal procedure is given by the least squares method: its performance may become disastrously poor under small deviations from normality.

Roughly speaking, robustness means stability of statistical inference under the departures from the accepted distribution models. Since the term “stability” is generally overloaded in mathematics, the term “robustness” may be regarded as its synonym.

Peter J. Huber and Frank R. Hampel contributed much to robust statistics: they proposed and developed two principal approaches to robustness, namely, the minimax approach and the approach based on influence functions, which were applied to almost all areas of statistics: robust estimation of location, scale, regression, and multivariate model parameters, as well as to robust hypothesis testing. It is remarkable that although robust statistics involves mathematically highly refined asymptotic tools, nevertheless robust methods show a satisfactory performance in small samples, being quite useful in applications.

The main topic of our book is robust correlation. Correlation analysis is widely used in multivariate statistics and data analysis: computing correlation and covariance matrices is both an initial and a basic step in most procedures of multivariate statistics, for example, in principal component analysis, factor and discriminant analysis, detection of multivariate outliers, etc.

Our work represents new results generally related to robust correlation and data analysis technologies, with definite accents both on theoretical aspects and practical needs of data processing: we have written the book to be accessible both to the users of statistical methods as well as to professional statisticians. However, the mathematical background requires the basics of calculus, linear algebra, andmathematical statistics.

Chapter 1 is an introduction into the book, providing historical aspects of the origin and development of the notion “correlation” in science as well as ontological remarks on the subject of statistics and data processing. Chapter 2 delivers a survey of the classical measures of correlation aimed most at estimating linear dependencies.

Chapter 3 represents Huber's and Hampel's principal approaches to robustness in mathematical statistics, with novel additions to them, namely, a stable estimation approach and an essay on robustness versus Gaussianity, the latter of which could be helpful for students and their teachers. Except for a few paragraphs on the application of Huber's minimax approach to distribution classes of a non-neighborhood nature, Chapters 1 to 3 are accessible to a wide reader audience.

Chapters 4 to 8 comprise the core of the book, which contains most of the new theoretical and experimental (Monte Carlo) results. Chapter 4 treats the problems of robust estimation of a scale parameter, and the obtained results are used in Chapter 5 for the design of highly robust and efficient estimates of a correlation coefficient including robust minimax (in the Huber sense) estimates. Chapter 6 provides an overview of classical multivariate correlation measures and inference tools based on the covariance and correlation matrix. Chapter 7 deals with robust correlation measures and inference tools that are based on various robust covariance matrix functionals and estimates; in particular, robust versions of principal component and canonical correlation analysis are given. Chapter 8 comprises correlation measures and inference tools based on various concepts of univariate and multivariate signs and ranks.

Chapters 9 to 11 are devoted to the applications of the aforementioned robust estimates of correlation, as well as of location and scale, to different problems of statistical data and signal analysis, with a few examples of real-life data and signal processing. Chapter 9 is confined to the applications to exploratory data analysis and its technologies, mostly treating an important problem of detection of outliers in the data. Chapter 10 outlines a few novel approaches to robust estimation of time series power spectra: although the obtained results are preliminary, they are profitable, deserving a further thorough study. In Chapter 11, various problems of robust signal detection are posed and treated, in the solution of which the Huber's minimax and stable approaches to robust detection are successfully exploited.

Chapter 12 outlines several open problems in robust multivariate analysis and its applications.

From the aforementioned it follows that there are two main blocks of the book: Chapters 1 to 3 and 9 to 11 aim at the applied statistician and statistics user audience, while Chapters 4 to 8 focus on the theoretical aspects of robust correlation.

Most of the contents of the book, namely Chapters 1 to 5 and 9 to 11, have been written by the first author. The second author contributed Chapters 6 to 8 on general multivariate analysis.

Acknowledgements

John W. Tukey, Peter J. Huber, Frank R. Hampel, Elvezio M. Ronchetti, and Peter J. Rousseeuw have essentially influenced, directly or indirectly, our views on robustness and data analysis.

The first author is deeply grateful to his teachers and colleagues for their helpful and constructive discussions, namely, to Igor B. Chelpanov, Peter Filzmoser, Eugene P. Gilbo, Jana Jureckova, Abram M. Kagan, Vladimir Ya. Katkovnik, Yuriy S. Kharin, Kiseon Kim, Lev B. Klebanov, Stephan Morgenthaler, Yakov Yu. Nikitin, Boris T. Polyak, Alexander M. Shurygin, and Nikita O. Vilchevski.

Some results presented in Chapters 4, 5, and 9 to 11 by the first author are based on the Ph.D. and M.Sc. dissertations of his former students, including Kliton Andrea, JinTae Park, Pavel Smirnov, Galina Lavrentyeva, Nickolay Lyubomishchenko, and Nikita Vassilevskiy—we would like to thank them.

Research on multivariate analysis reported by the second author is to some degree based on the thesis works of his several ex-students, including Jyrki Möttönen, Samuli Visuri, Esa Ollila, Sara Taskinen, Seija Sirkiä, and Klaus Nordhausen. We wish to thank them all. The second author is naturally also indebted to many colleagues and coauthors for valuable discussions and express his sincere thanks for discussions and cooperation in this specific research area with Christopher Croux, Tom Hettmansperger, Annaliisa Kankainen, Visa Koivunen, Ron Randles, Bob Serfling, and Dave Tyler.

We are also grateful to Igor Bezdvornyh and Maksim Sovetnikov for their technical help in the preparation of the manuscript.

Finally, we wish to thank our wives, Elena and Ritva, for their patience, support, and understanding.

About the companion website

Don't forget to visit the companion website for this book:

www.wiley.com/go/Shevlyakov/Robust

There you will find valuable material designed to enhance your learning, including:

Datasets

R codes

Scan this QR code to visit the companion website.

Chapter 1Introduction

This book is most about correlation, association and partially about regression, i.e., about those areas of science where the dependencies between random variables that mathematically describe the relations between observed phenomena and associated with them features are studied. Evidently, these concepts and terms firstly appeared in applied sciences, not in mathematics. Below we briefly overview the historical aspects of the considered concepts.

1.1 Historical Remarks

The word “correlation” is of late Latin origin meaning “association”, “connection”, “correspondence”, “interdependence”, “relationship”, but relationship not in the conventional for that time deterministic functional form.

The term “correlation” was introduced into science by a French naturalist Georges Cuvier (1769–1832), one of the major figures in natural sciences in the early 19th century, who had founded paleontology and comparative anatomy. Cuvier discovered and studied the relationships between the parts of animals, between the structure of animals and their mode of existence, between the species of animals and plants, and many others. This experience made him establish the general principles of “the correlation of parts” and of “the functional correlation” (Rudwick 1997):

Today comparative anatomy has reached such a point of perfection that, after inspecting a single bone, one can often determine the class, and sometimes even the genus of the animal to which it belonged, above all if that bone belonged to the head or the limbs. … This is because the number, direction, and shape of the bones that compose each part of an animal's body are always in a necessary relation to all the other parts, in such a way that – up to a point – one can infer the whole from any one of them and vice versa.

From Cuvier to Galton, correlation had been understood as a qualitatively described relationship, not deterministic but of a statistical nature, however observed at that time within a rather narrow area of phenomena.

The notion of regression is connected with the great names of Laplace, Legendre, Gauss, and Galton (1885), who coined this term. Laplace (1799) was the first to propose a method for processing the astronomical data, namely, the least absolute values method. Legendre (1805) and Gauss (1809) independently of each other introduced the least squares method.

Francis Galton (1822–1911), a British anthropologist, biologist, psychologist, andmeteorologist, understood that correlation is the interrelationship in average between any random variables (Galton 1888):

Two variable organs are said to be co-related when the variation of the one is accompanied on the average by more or less variation of the other, and in the same direction.… It is easy to see that co-relation must be the consequence of the variations of the two organs being partly due to common cause.… If they were in no respect due to common causes, the co-relation would be nil.

Correlation analysis (this term also was coined by Galton) deals with estimation of the value of correlation by number indexes or coefficients.

Similarly to Cuvier, Galton introduced regression dependence observing live nature, in particular, processing the heredity and sweet peas data (Galton 1894). Regression characterizes the correlation dependence between random variables functionally in average. Studying the sizes of sweet peas beans, he noticed that the offspring seeds did not reveal the tendency to reproduce the size of their parents being closer to the population mean than them. Namely, the seeds were smaller than their parents in the case of large parent sizes, and vice versa. Galton called this dependence regression, for the reverse changes had been observed: firstly, he used the term “the law of reversion”. Further studies showed that on average the offspring regression to the population mean was proportional to the parent deviations from it – this allowed the observed dependence to be described using the linear function. The similar linear regression is described by Galton as a result of processing the heights of 930 adult children and their 205 parents (Galton 1894).

The term “regression” became popular, and now it is used in the case of functional dependencies in average between any random variables. Using modern terminology, we may say that Galton considered the slope of the simple linear regression line as a measure of correlation (Galton 1888):

Letthe deviation of the subject [in units of the probably error, ], whichever of the two variables may be taken in that capacity; and letbe the corresponding deviations of the relative, and let the mean of these be. Then we find: (1) thatfor all values of; (2) thatis the same, whichever of the two variables is taken for the subject; (3) thatis always less than 1; (4) thatmeasures the closeness of co-relation.

Now we briefly comment on the above-mentioned properties (1)–(4): the first is just the simple linear regression equation between the standardized variables and ; the second means that the co-relation is symmetric with regard to the variables and ; the third and fourth show that Galton had not yet recognized the idea of negative correlation: stating that could not be greater than 1, he evidently understood as a positive measure of “co-relation”. Originally stood for the regression slope, and that is really so for the standardized variables; Galton perceived the correlation coefficient as a scale invariant regression slope.

Galton contributed much to science studying the problems of heredity of qualitative and quantitative features. They were numerically examined by Galton on the basis of the concept of correlation. Until the present, the data on demography, heredity, and sociology collected by Galton with the corresponding numerical examples of correlations computed are used.

Karl Pearson (1857–1936), a British mathematician, statistician, biologist, and philosopher, had written out the explicit formulas for the population product-moment correlation coefficient (Pearson 1895)

1.1

and its sample version

1.2

(here and are the sample means of the observations and of random variables and ). However, Pearson did not definitely distinguish the population and sample versions of the correlation coefficient, as it is commonly done at present.

Thus, on the one hand, the sample correlation coefficient is a statistical counterpart of the correlation coefficient of a bivariate distribution, where , , and are the variances and the covariance of the random variables and , respectively.

On the other hand, it is an efficient maximum likelihood estimate of the correlation coefficient of the bivariate normal distribution (Kendall and Stuart 1963) with density

1.3

where , , , .

Galton (1888) derived the bivariate normal distribution (1.3), and he was the first who used it to scatter the frequencies of children's stature and parents' stature. Pearson noted that “in 1888 Galton had completed the theory of bivariate normal correlation” (Pearson 1920).

Like Galton, Auguste Bravais (1846), a French naval officer and astronomer, came very near to the definition (1.1) when he called one parameter of the bivariate normal distribution “une correlation”, but he did not recognize it as a measure of the interrelationship between variables. However, “his work in Pearson's hands proved useful in framing formal approaches in those areas” (Stigler 1986).

Pearson's formulas (1.1) and (1.2) proved to be fruitful for studying dependencies: correlation analysis and most of multivariate statistical analysis tools are based on the pair-wise Pearson correlations; we may also add the correlation and spectral theories of stochastic processes, etc.

Since the time Pearson introduced the sample correlation coefficient (1.2), many other measures of correlation have been used aiming at estimation of the closeness of interrelationship (the coefficients of association, determination, contingency, etc.). Some of them were proposed by Karl Pearson (1920).

It would not be out of place to note the contributions to correlation analysis of the other British statisticians.

Ronald Fisher (1890–1962) is one of the creators of mathematical statistics. In particular, he is the originator of the analysis of variance and together with Karl Pearson he stands at the beginning of the theory of hypothesis testing. He introduced the notion of a sufficient statistic and proposed the maximum likelihood method (Fisher 1922). Fisher also payed much attention to correlation analysis: his tools for verifying the significance of correlation under the normal law are used until now.

George Yule (1871–1951) is a prominent statistician of the first half of the 20th century. He contributed much to the statistical theories of regression, correlation (Yule's coefficient of contingency between random events), and spectral analysis.

Maurice Kendall (1907–1983) is one of the creators of nonparametric statistics, in particular, of the nonparametric correlation analysis (the Kendall -rank correlation) (Kendall 1938). It is noteworthy that he is the coauthor of the classical course in mathematical statistics (Kendall and Stuart 1962, 1963, 1968).

In what follows, we represent their contributions to correlation analysis in more detail.

1.2 Ontological Remarks

Our personal research experience in applied statistics and real-life data analysis is relatively broad and long. It is concerned with the problems of data processing in medicine (cardiology and ophthalmology), biology (genetics), economics and finances (financial mathematics), industry (mechanical engineering, energetics, and material science), and analysis of semantic data and informatics (information retrieval from big data). Besides and due to those problems, we have been working in theoretical statistics, most in robust and nonparametric statistics, as well as in multivariate statistics and time series analysis. Now we briefly outline our vision of the topic of this book to indicate its place in the general context of statistical data analysis with its philosophy and ideological environment.

The reader should only remember that any classification is a convention, such are the forthcoming ones.

1.2.1 Forms of data representation

The customary forms of data representation are as follows (Shevlyakov and Vilchevski 2002, 2011):

as a sample of real numbers being the most convenient form to deal with;

as a sample of real-valued vectors of dimension ;

as an observed realization , of a real-valued continuous process (function);

as a sample of “non-numerical nature” data representing qualitative variables;

as the semantic type of data (statements, texts, pictures, etc.).

The first three possibilities mostly occur in the natural and technical sciences with the measurement techniques being well developed, clearly defined, and largely standardized. In the social sciences, the last forms are relatively common.

To summarize: in this book we deal mostly with the first three forms and, partially, with the fourth.

1.2.2 Types of data statistics

The experience of treating various statistical problems shows that practically all of them are solved with the use of only a few qualitatively different types of data statistics. Here we do not discuss how to use them in solving statistical problems: only note that their solutions result in computing some of those statistics, and final decision making essentially depends on their values (Mosteller and Tukey 1977; Tukey 1962).

These data statistics may be classified as follows:

measures of location (central tendency, mean values),

measures of scale (spread, dispersion, scatter),

measures of correlation (interdependence, association),

measures of extreme values,

measures of a data distribution shape,

measures of data spectrum.

To summarize: in this book we mainly focus on the measures of correlation, however dealing if needed with the other types of data statistics.

1.2.3 Principal aims of statistical data analysis

These aims can be formulated as follows:

(A1)

compact representation of data,

(A2)

estimation of model parameters explaining and/or revealing data structure,

(A3)

prediction.

A human mind cannot efficiently work with large volumes of information, since there exist natural psychological bounds on the perception ability (Miller 1956). Thus it is necessary to provide a compact data output of information for expert analysis: only in this case we may expect a satisfactory final decision. Note that data processing often begins and ends with the first item (A1).

The next step (A2) is to propose an explanatory underlying model for the observed data and phenomena. It may be a regression model, or a distribution model, or any other, desirably a low-complexity one: an essentially multiparametric model is usually a “bad” model; nevertheless, we should recall a cute note of George Box: “All models are wrong, but some of them are useful” (Box and Draper 1987). However, parametric models are the first to consider and examine.

Finally, the first two aims are only the steps to the last aim (A3): here we have to state that this aim remains a main challenge to statistics and to science as a whole.

To summarize: in this book we pursue aims (A1) and (A2).

1.2.4 Prior information about data distributions and related approaches to statistical data analysis

The need for stability in statistical inference directly leads to the use of robust statistical methods. It may be roughly stated that, with respect to the level of prior information about underlying data distributions, robust statistical methods occupy the intermediate place between classical parametric and nonparametric methods.

In parametric statistics, the shape of an underlying data distribution is assumed known up to the values of unknown parameters. In nonparametric statistics, it is supposed that the underlying data distribution belongs to some sufficiently “wide”class of distributions (continuous, symmetric, etc.). In robust statistics, at least within Huber's minimax approach (Huber 1964), we also consider distribution classes but with more detailed information about the underlying distribution, say, in the form of a neighborhood of the normal distribution. The latter peculiarity allows the efficiency of robust procedures to be raised as compared with nonparametric methods, simultaneously retaining their high stability.

At present, there exist two main approaches in robustness:

Huber's minimax approach — quantitative robustness (Huber 1981; Huber and Ronchetti 2009).

Hampel's approach based on influence functions — qualitative robustness (Hampel 1968; Hampel

et al

. 1986).

In Chapter 3, we describe these approaches in detail. Now we classify the existing approaches in statistics with respect to the level of prior information about the underlying data distribution in the case of point parameter estimation:

A given data distribution with a random parameter — the Bayesian statistics (Berger 1985; Bernardo and Smith 1994; Jaynes 2003).

A given data distribution with an unknown parameter — the classical parametric statistics (Fisher 1922; Kendall and Stuart 1963).

A data distribution with an unknown parameter belongs to a distribution class , usually a neighborhood of a given distribution, e.g., normal — the robust statistics (Hampel

et al

. 1986; Huber 1981; Kolmogorov 1931; Tukey 1960).

A data distribution with an unknown parameter belongs to some general distribution class — the classical nonparametric statistics (Hettmansperger and McKean 1998; Kendall and Stuart 1963; Wasserman 2007).

A data distribution does not exist in the case of unique samples and frequency instability — the probability-free approaches to data analysis: fuzzy (Zadeh 1975), exploratory (Bock and Diday 2000; Tukey 1977), interval probability (Kuznetsov 1991; Walley 1990), logical-algebraic, geometrical (Billard and Diday 2003; Diday 1972).

Note that the upper and lower levels of this hierarchy, namely the Bayesian and the probability-free approaches, are being intensively developed at present.

To summarize: in this book we mainly use Huber's and Hampel's robust approaches to statistical data analysis.

References

Berger JO 1985

Statistical Decision Theory and Bayesian Analysis

, Springer.

Bernardo JM and Smith AFM 1994

Bayesian Theory

, Wiley.

Billard L and Diday E 2003 From the statistics of data to the statistics of knowledge: symbolic data analysis.

J. Amer. Statist. Assoc.

98

, 991–999.

Bock HH and Diday E (eds) 2000

Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data

, Springer.

Box GEP and Draper NR 1987

Empirical Model-Building and Response Surfaces

, Wiley.

Bravais A 1846 Analyse mathématique sur les probabilités des erreurs de situation d'un point.

Mémoires presents par divers savants l'Académie des Sciences de l'Institut de France

.

Sciences Mathématiques et Physiques

9

, 255–332.

Diday E 1972 Nouvelles Méthodes et Nouveaux Concepts en Classification Automatique et Reconnaissance des Formes. These de doctorat d'état, Univ. Paris IX.

Fisher RA 1922 On the mathematical foundations of theoretical statistics.

Philosophical Transactions of the Royal Society

, A

222

, 309–368.

Galton F 1885 Regression towards mediocrity in hereditary stature.

Journal of Anthropological Institute

15

, 246–263.

Galton F 1888 Co-relations and their measurement, chiefly from anthropometric data.

Proceedings of the Royal Society of London

45

, 135–145.

Galton F 1894

Natural Inheritance

, Macmillan, London.

Gauss CF 1809

Theoria Motus Corporum Celestium

,

Perthes, Hamburg; English translation: Theory of the Motion of the Heavenly Bodies Moving about the Sun in Conic Sections

. New York: Dover, 1963.

Hampel FR 1968

Contributions to the Theory of Robust Estimation

. PhD thesis, University of California, Berkeley.

Hampel FR, Ronchetti E, Rousseeuw PJ, and Stahel WA 1986

Robust Statistics. The Approach Based on Influence Functions

, Wiley.

Hettmansperger TP and McKean JW 1998

Robust Nonparametric Statistical Methods. Kendall's Library of Statistics

, Edward Arnold, London.

Huber PJ 1964 Robust estimation of a location parameter.

Ann. Math. Statist.

35

, 73–101.

Huber PJ 1981

Robust Statistics

, Wiley.

Huber PJ and Ronchetti E (eds) 2009

Robust Statistics

, 2nd edn, Wiley.

Jaynes AT 2003

Probability Theory. The Logic of Science

, Cambridge University Press.

Kendall MG 1938 A new measure of rank correlation.

Biometrika

30

, 81–89.

Kendall MG and Stuart A 1962

The Advanced Theory of Statistics. Distribution Theory

, vol. 1, Griffin, London.

Kendall MG and Stuart A 1963

The Advanced Theory of Statistics. Inference and Relationship

, vol. 2, Griffin, London.

Kendall MG and Stuart A 1968

The Advanced Theory of Statistics. Design and Analysis, and Time Series

, vol. 3, Griffin, London.

Kolmogorov AN 1931 On the method of median in the theory of errors.

Math. Sbornik

38

, 47–50.

Kuznetsov VP 1991

Interval Statistical Models

, Radio i Svyaz, Moscow (in Russian).

Legendre AM 1805

Nouvelles methods pour la determination des orbits des cometes

, Didot, Paris.

Miller GA 1956 The magical number seven, plus or minus two: Some limits on our capacity for processing information.

Psychological Review

63

, 81–97.

Mosteller F and Tukey JW 1977

Data Analysis and Regression

, Addison–Wesley.

Pearson K 1895 Contributions to the mathematical theory of evolution.

Philosophical Transactions of the Royal Society of London

A

186

, 343–414.

Pearson K 1920 Notes on the history of correlations.

Biometrika

13

, 25–45.

Rudwick MJS 1997

Georges Cuvier, Fossil Bones, and Geological Catastrophes

, University of Chicago Press.

Shevlyakov GL and Vilchevski NO 2002

Robustness in Data Analysis: criteria and methods

, VSP, Utrecht.

Shevlyakov GL and Vilchevski NO 2011

Robustness in Data Analysis

, De Gruyter, Boston.

Stigler SM 1986

The History of Statistics: The Measurement of Uncertainty before 1900

. Belknap Press/Harvard University Press.

Tukey JW 1960 A survey of sampling from contaminated distributions. In

Contributions to Probability and Statistics

. (ed. Olkin I). pp. 448–485. Stanford Univ. Press.

Tukey JW 1962 The future of data analysis.

Ann. Math. Statist.

33

, 1–67.

Tukey JW 1977

Exploratory Data Analysis

, Addison–Wesley.

Walley P 1990

Statistical Reasoning with Imprecise Probabilities

, Chapman & Hall.

Wasserman L 2007

All of Nonparametric Statistics

, Springer.

Zadeh LA 1975 Fuzzy logic and approximate reasoning.

Synthese

30

, 407–428.

Chapter 2Classical Measures of Correlation

In this chapter we define several conventional measures of correlation, focusing most on Pearson's correlation coefficient and closely related to it constructions, enlist their principal properties and computational peculiarities.

2.1 Preliminaries

Here we comment on the requirements that should be imposed on the measures of correlation to distinguish them from the measures of location and scale (Renyi 1959; Schweizer and Wolff 1981).

Let be a measure of correlation between any random variables and . Here we consider both positive–negative correlation

and positive correlation

It is natural to impose the following requirements on :

(R1)

  Symmetry: .

(R2)

  Invariancy to linear transformations of random variables:

(R3)

  Attainability of the limit values 0, , and :

for independent and , ;

;

for positive-negative correlation.

(R4)

  Invariancy to strictly monotonic transformations of random variables:

for strictly monotonic functions and .

(R5)

  .

Requirement (R1) holds almost for all known measures of correlation, being a natural assumption for correlation analysis as compared to regression analysis when it is not known which variables are dependent and which not.

Requirement (R2) makes a measure of correlation independent of the chosen measures of location and scale, since each of them reflects qualitatively different data characteristics.

Requirements (R3a), (R3b), and (R3c), on the one hand, are merely technical: it is practically and theoretically convenient to deal with a bounded scaleless measure of correlation; on the other hand, they refer to the correspondence of the limit values of to the limit cases of association between random variables and : the relation may mean independence of and whereas the relation indicates the functional dependence between and .

The first three requirements hold for almost all known measures of correlation. This is not so with the relation , which does not guarantee independence of and for several measures of correlation, for example, for Pearson's product-moment, the Spearman rank correlation, and for a few others.

However, this property holds for the maximal correlation coefficient defined as

where is Pearson's product-moment correlation coefficient (1.1), and are Borel-measurable functions such that has sense (Gebelein 1941). The independence of random variables and also follows from the null value of Sarmanov's correlation coefficient (also called the maximal correlation coefficient) (Sarmanov 1958): in the case of a continuous symmetric bivariate distribution of and , its value is reciprocal to the minimal eigenvalue of some integral operator. Apparently, Gebelein's and Sarmanov's correlation coefficients are rather complicated in their usage.

Recently, in Székely et al. (2007), a distance correlation has been proposed: its equality to null implies independence, but like Gebelein's and Sarmanov's correlation coefficients, it is much more complicated in computing than classical measures of correlation.

Requirements (R4) and (R5) refer to the rank measures of correlation, for example, to the Spearman and Kendall -rank correlation coefficients.

Now we enlist the well-known seven postulates of Renyi (1959), formulated for a measure of dependence defined in the segment :

(P1)

  is defined for any pair of random variables and , neither of them being constant with probability 1.

(P2)

  .

(P3)

  .

(P4)

  if and only if and are independent.

(P5)

  if there is a strict dependence between and , that is, either or , where and are Borel-measurable functions.

(P6)

  If the Borel-measurable functions and map the real axis in a one-to-one way on to itself, .

(P7)

  If the joint distribution of and is normal, then , where is Pearson's product-moment correlation coefficient.

This set of postulates is more restrictive than the proposed set (R1)–(R5) mostly because of the chosen range and the last postulate (P7) that yield the absolute value of Pearson's correlation . Later we return to this set when considering informational measures of correlation. Moreover, in what follows, we generally focus on the conventional tools of correlation analysis based on Pearson's correlation coefficient and closely related to it measures, implicitly using Renyi's postulates.

2.2 Pearson's Correlation Coefficient: Definitions and Interpretations

Below we represent a series of different conceptual and computational definitions of the population and sample Pearson's correlation coefficients and , respectively. Each definition indicates a different way of thinking about this measure within different statistical contexts by using algebraic, geometric, and trigonometric settings (Rogers and Nicewander 1988).

2.2.1 Introductory remarks

The traditional for introductory statistics textbooks definitions (1.1) and (1.2) for Pearson's and can be evidently rewritten as

and

2.1

where and are the mean squared errors.

Equation (2.1) for the sample correlation coefficient can be regarded as the sample covariance of the standardized random variables, namely and .

Pearson's correlation coefficient possesses the properties (R1) and (R2) with the bounds : the cases and correspond to the linear dependence between variables.

Thus, Pearson's correlation coefficient is a measure of the linear interrelationship between random variables. Furthermore, the relations and do not induce independence of random variables. The typical shapes of correlated data clusters are exhibited in Figs 2.1 to 2.5.

Figure 2.1Data with positive correlation.

Figure 2.2Data with negative correlation.

Figure 2.3Data with approximately zero correlation.

Figure 2.4Approximately nonlinear dependent data correlation.

Figure 2.5Linear dependent data correlation.

2.2.2 Correlation via regression

The problem of estimation of the correlation coefficient is directly related to the linear regression problem of fitting the straight line of the conditional expectation (Kendall and Stuart 1963)

2.2