116,99 €
Covers the latest developments in direction dependence research Direction Dependence in Statistical Modeling: Methods of Analysis incorporates the latest research for the statistical analysis of hypotheses that are compatible with the causal direction of dependence of variable relations. Having particular application in the fields of neuroscience, clinical psychology, developmental psychology, educational psychology, and epidemiology, direction dependence methods have attracted growing attention due to their potential to help decide which of two competing statistical models is more likely to reflect the correct causal flow. The book covers several topics in-depth, including: * A demonstration of the importance of methods for the analysis of direction dependence hypotheses * A presentation of the development of methods for direction dependence analysis together with recent novel, unpublished software implementations * A review of methods of direction dependence following the copula-based tradition of Sungur and Kim * A presentation of extensions of direction dependence methods to the domain of categorical data * An overview of algorithms for causal structure learning The book's fourteen chapters include a discussion of the use of custom dialogs and macros in SPSS to make direction dependence analysis accessible to empirical researchers.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 752
Veröffentlichungsjahr: 2020
Direction Dependence in Statistical Modeling
Copyright
dedication-page
About the Editors
Notes on Contributors
Acknowledgments
Preface
References
Part I: Fundamental Concepts of Direction Dependence
1 From Correlation to Direction Dependence Analysis 1888–2018
1.1 Introduction
1.2 Correlation as a Symmetrical Concept of X and Y
1.3 Correlation as an Asymmetrical Concept of X and Y
1.4 Outlook and Conclusions
References
2 Direction Dependence Analysis
2.1 Some Origins of Direction Dependence Research
2.2 Causation and Asymmetry of Dependence
2.3 Foundations of Direction Dependence
2.4 Direction Dependence in Mediation
2.5 Direction Dependence in Moderation
2.6 Some Applications and Software Implementations
2.7 Conclusions and Future Directions
References
3 The Use of Copulas for Directional Dependence Modeling
3.1 Introduction and Definitions
3.2 Directional Dependence Between Two Numerical Variables
3.3 Directional Association Between Two Categorical Variables
3.4 Concluding Remarks and Future Directions
References
Part II: Direction Dependence in Continuous Variables
4 Asymmetry Properties of the Partial Correlation Coefficient
4.1 Asymmetry Properties of the Partial Correlation Coefficient
4.2 Direction Dependence Measures when Errors Are Non‐Normal
4.3 Statistical Inference on Direction Dependence
4.4 Monte‐Carlo Simulations
4.5 Data Example
4.6 Discussion
References
5 Recent Advances in Semi‐Parametric Methods for Causal Discovery
5.1 Introduction
5.2 Linear Non‐Gaussian Methods
5.3 Nonlinear Bivariate Methods
5.4 Conclusion
References
6 Assumption Checking for Directional Causality Analyses
6.1 Epistemic Causality
6.2 Assessment of Functional Form: Loess Regression
6.3 Influential and Outlying Observations
6.4 Directional Dependence Based on All Available Data
6.5 Directional Dependence Based on Latent Difference Scores
6.6 Direction Dependence Based on State‐Trait Models
6.7 Discussion
References
7 Complete Dependence
7.1 Basic Properties
7.2 Measure of Complete Dependence
7.3 Example Calculation
7.4 Future Works and Open Problems
References
Part III: Direction Dependence in Categorical Variables
8 Locating Direction Dependence Using Log‐Linear Modeling, Configural Frequency Analysis, and Prediction Analysis
8.1 Specifying Directional Hypotheses in Categorical Variables
8.2 Types of Directional Hypotheses
8.3 Analyzing Event‐Based Directional Hypotheses
8.4 Data Example
8.5 Reversing Direction of Effect
8.6 Discussion
References
9 Recent Developments on Asymmetric Association Measures for Contingency Tables
9.1 Introduction
9.2 Measures on Two‐Way Contingency Tables
9.3 Asymmetric Measures of Three‐Way Contingency Tables
9.4 Simulation of Three‐Way Contingency Tables
9.5 Real Data of Three‐Way Contingency Tables
References
10 Analysis of Asymmetric Dependence for Three‐Way Contingency Tables Using the Subcopula Approach
10.1 Introduction
10.2 Review on Subcopula Based Asymmetric Association Measure for Ordinal Two‐Way Contingency Table
10.3 Measure of Asymmetric Association for Ordinal Three‐Way Contingency Tables via Subcopula Regression
10.4 Numerical Examples
10.5 Conclusion
Appendix
References
Part IV: Applications and Software
11 Distribution‐Based Causal Inference
11.1 Introduction
11.2 Direction of Dependence in Linear Regression
11.3 Previous Epidemiologic Applications of Distribution‐Based Causal Inference
11.4 A Running Example: Re‐Visiting the Case of Sleep Problems and Depression
11.5 Evaluating the Assumptions in Practical Work
11.6 Distribution‐Based Causality Estimates for the Running Example
11.7 Conducting Sensitivity Analyses
11.8 Simulation‐Based Analysis of Statistical Power
11.9 Triangulating Causal Inferences
11.10 Conclusion
References
12 Determining Causality in Relation to Early Risk Factors for ADHD
12.1 Method
12.2 Results
12.3 Discussion
Acknowledgments
References
13 Direction of Effect Between Intimate Partner Violence and Mood Lability
13.1 Introduction
13.2 Methods
13.3 Results
13.4 Discussion
References
14 On the Causal Relation of Academic Achievement and Intrinsic Motivation
14.1 Direction of Dependence in Linear Regression
14.2 The Causal Relation of Intrinsic Motivation and Academic Achievement
14.3 Direction Dependence Analysis Using SPSS
14.4 Conclusions
References
Author Index
Subject Index
End User License Agreement
Chapter 2
Table 2.1
Summary of assumptions of direction dependence analysis (DDA) and con...
Chapter 3
Table 3.1
The values of directional dependence measures for the data sets
A
–
D
....
Chapter 4
Table 4.1
Median bias and median percent bias of three direction dependence meas...
Table 4.2
95% CI coverage rates of the four direction dependence measures as a f...
Table 4.3
Bivariate Pearson correlations and univariate descriptive measures.
Table 4.4
Results of the linear regression model predicting perceived energy 12 ...
Table 4.5
Results of resampling‐based direction dependence tests.
Chapter 6
Table 6.1
Direction analysis of pre‐college composite scores (wave 0).
Table 6.2
Cross‐tabulations of influential observations for target and alternate...
Table 6.3
Frequencies of influential observations under target and alternate mod...
Table 6.4
Direction dependence analysis of factor scores.
Table 6.5
Directional analysis of latent difference.
Table 6.6
Frequencies of influential observations under target and alternate mod...
Table 6.7
Direction dependence of latent trait factor scores.
Table 6.8
Frequencies of influential observations under target and alternate mod...
Table 6.9
Frequencies of influential observations under target and alternate mod...
Table 6.10
Directional analysis of state residual scores for pre‐college (wave 0...
Table 6.11
Directional analysis of residual scores for wave 1.
Table 6.12
Directional analysis of residual scores for wave 3.
Table 6.13
Directional analysis of residual scores for wave 5.
Table 6.14
Directional analysis of residual scores for wave 7.
Chapter 8
Table 8.1
Truth table for the two statements
p
and
q
and five links.
Table 8.2
Direction of effect for
x
1
→
y
2
.
Table 8.3
Hit cells for the implications
a
1
b
3
→
c
2
d
1
and
a
2
b
2
→
c
1
d
2
.
Table 8.4
Hit cells for the implication
X1, 1 ∧ X2, 1 ∨ X1, 2 ∧ X2, 1 → Y2.
...
Table 8.5
Hit Cells for the implication
X1 → Y1, 2 ∧ Y2, 1 ∨ Y1, 2 ∧ Y2, 2.
...
Table 8.6
Truth table for the implication
X1, 1 ∧ X2, 1 ∨ X1, 2 ∧ X2, 1 → Y2
...
Table 8.7
Correlation matrix of the design matrix for the hypothesis
X1, 1 ∧ X2,
...
Table 8.8
[PCD, OMS] × [DEP] cross‐classification.
Table 8.9
Goodness of fit of two models for the analysis of the [PCD, OMS] × [DE...
Table 8.10
CFA of the [PCD, OMS] × [DEP] cross‐classification.
Table 8.11.
Goodness of fit of base model and reverse‐direction model for the an...
Table 8.12
CFA of the [PCD, OMS] × [DEP] cross‐classification under two directio...
Chapter 9
Table 9.1
Three‐way contingency table of
X
,
Y
, and
Z
with
P = {pijk}
...
Table 9.2
(a) Joint p.m.f of
Y
and
Z
; (b) joint p.m.f of
X
and
Z
; (c) joint p.m....
Table 9.3
Three‐way contingency tables of dichotomous variables.
Table 9.4
Supports of
U
,
V
,
W
and the joint p.m.f. of
C
.
Table 9.5
Black Olive preference (
P
) by location (
L
) and urbanization (
U
).
Table 9.6
Worker satisfaction for organizational aspects (
O
) and satisfaction fo...
Chapter 10
Table 10.1
Job satisfaction data (Beh & Lombardo, 2014, p. 478).
Table 10.2
Analysis of asymmetric association in job satisfaction data.
Table 10.3
Hierarchical analysis for two types of association in job satisfactio...
Chapter 11
Table 11.1
Estimates of causal direction for depression and sleep variables of t...
Table 11.2
Role of the variables on generating the data of the four simulation s...
Table 11.3
Estimated biometric sources of variance for depression and sleep vari...
Chapter 12
Table 12.1
Sample description.
Table 12.2
Breastfeeding duration by ADHD group status.
Table 12.3
Covariate evaluation and selection.
Table 12.4
Regression models (
n
= 829).
Table 12.5
DDA results for covariate‐adjusted models of the form
breastfeeding
→...
Table 12.6
DDA results for covariate‐adjusted models of the form
parent‐rated AD
...
Table 12.7
DDA results for covariate‐adjusted models of the form
breastfeeding
→...
Table 12.8
DDA results for covariate‐adjusted models of the form
parent‐rated AD
...
Table 12.9
DDA results for covariate‐adjusted models of the form
breastfeeding
→...
Table 12.10
DDA results for covariate‐adjusted models of the form
teacher‐rated
...
Table 12.11
DDA results for covariate‐adjusted models of the form
breastfeeding
...
Table 12.12
DDA results for covariate‐adjusted models of the form
teacher‐rated
...
Chapter 13
Table 13.1
Autoregression parameter estimates for IPV, LEAVE, and MOOD (standard...
Chapter 14
Table 14.1
Summary of DDA components and model‐specific DDA patterns.
Table 14.2
Bivariate Pearson correlation coefficients and descriptive measures o...
Table 14.3
Results of the competing regression models.
Table 14.4
Summary of DDA decisions.
Chapter 2
Figure 2.1 Patterns of direction of dependence of three explanatory causal m...
Figure 2.2 Six alternative mediation models together with the corresponding ...
Chapter 3
Figure 3.1 Construction of a copula and its use.
Figure 3.2 Plot of the copula regression functions for
β1 = 10, β2 = 40
...
Figure 3.3 Plot of the copula regression functions for
β1 = 10, β2 = 40
...
Figure 3.4 Plot of the copula regression functions for
β1 = 10, β2 = 40
...
Figure 3.5 Plot of the difference between the copula regression functions fo...
Figure 3.6 Plots of directional dependence measures
and
together with th...
Figure 3.7 Distributional cycle. The key random variables and their roles on...
Figure 3.8 Dependence cycle. The key random variables and their roles on und...
Figure 3.9 Distributional circle showing the pairwise correlations for three...
Figure 3.10 The plots of a data set where order statistics of
X
and
Y
are bo...
Figure 3.11 Surface and contour plots of odds ratio and conditional ratios f...
Figure 3.12 The behavior of
D
X
→
Y
as a function of
p+1
...
Chapter 4
Figure 4.1 Empirical power of detecting the true model as a function of
βyx
...
Figure 4.2 Empirical power of 95% BCa CIs of four direction dependence measu...
Figure 4.3 Univariate distributions (main diagonal) and bivariate scatterplo...
Chapter 5
Figure 5.1 An example causal graph.
Figure 5.2 Three candidate causal models.
Figure 5.3 Three candidate causal models in the presence of hidden common ca...
Figure 5.4 The skeleton of a causal graph with an undirected edge between
x
1
Chapter 6
Figure 6.1 Confirmatory factor model for pre‐college assessment of reasons f...
Figure 6.2 Latent difference score model (unstandardized co‐efficients).
Figure 6.3 Example state trait model (standardized co‐efficients).
Figure 6.4 Loess regression of social motives on enhancement motives, pre‐co...
Figure 6.5 Loess regression of enhancement motives on social motives – pre‐c...
Chapter 9
Figure 9.1 The complete dependence measure
μ(Y, Z ∣ X)
...
Chapter 10
Figure 10.1 The estimated subcopula regression based association measure
(...
Figure 10.2 The estimated subcopula regression based association measure
(...
Figure 10.3 The estimated subcopula regression based association measure
(...
Chapter 11
Figure 11.1 Illustration of skewness‐based causal signal. Histogram of a Log...
Figure 11.2 Illustrating the SATSA data. Whereas the histograms (a and b) sh...
Figure 11.3 Illustrating lurking nonlinearities. The first panel shows a sca...
Figure 11.4 Illustration of the assumed model in DirectLiNGAM (a; one variab...
Figure 11.5 Results for simulation‐based sensitivity analyses of DirectLiNGA...
Figure 11.6 Results for the power analysis of DirectLiNGAM in terms of skewn...
Figure 11.7 Path diagram for Direction of Causation (DoC) models in the runn...
Chapter 12
Figure 12.1 Distributions for Thurstone factor scores (mother's age at child...
Chapter 13
Figure 13.1 Granger causality model in which IPV causes MOOD lability (stand...
Figure 13.2 Granger causality model in which IPV causes MOOD lability, under...
Chapter 14
Figure 14.1 Simplified competing models with standardized focal variables
x
...
Figure 14.2 Conceptual diagram of model (c) with an unmeasured confounder to...
Figure 14.3 SPSS main dialogue box to perform DDA.
Figure 14.4 DDA options dialogue box.
Figure 14.5 SPSS output of DDA variable distribution tests.
Figure 14.6 Univariate distributions (main diagonal) and scatterplot (upper ...
Figure 14.7 SPSS output of residual distribution tests.
Figure 14.8 DDA function dialogue box.
Figure 14.9 SPSS outputs of DDA independence tests.
Title Page
Copyright
Dedication
About the Editors
Notes on Contributors
Acknowledgments
Preface
Table of Contents
Begin Reading
Author Index
Subject Index
WILEY END USER LICENSE AGREEMENT
iv
v
xv
xvi
xvii
xviii
xxi
xxiii
xxiv
xxv
xxvi
xxvii
xxviii
xxix
xxx
1
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
81
82
83
84
85
86
87
88
89
90
91
92
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
154
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
183
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
Edited by
Wolfgang Wiedermann
Daeyoung Kim
Engin A. Sungur
Alexander von Eye
This edition first published 2021
© 2021 John Wiley & Sons, Inc.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.
The right of Wolfgang Wiedermann, Daeyoung Kim, Engin A. Sungur, and Alexander von Eye to be identified as the authors of the editorial material in this work has been asserted in accordance with law.
Registered Office
John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA
Editorial Office
111 River Street, Hoboken, NJ 07030, USA
For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.
Wiley also publishes its books in a variety of electronic formats and by print‐on‐demand. Some content that appears in standard print versions of this book may not be available in other formats.
Limit of Liability/Disclaimer of Warranty
While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
Library of Congress Cataloging‐in‐Publication Data
Names: Wiedermann, Wolfgang, 1981‐ editor. | Kim, Daeyoung, editor. |
Sungur, Engin, editor. | Eye, Alexander von, editor.
Title: Direction dependence in statistical modeling : methods of analysis /
edited by Wolfgang Wiedermann, Daeyoung Kim, Engin Sungur, Alexander von
Eye.
Description: Hoboken, NJ : Wiley, 2021. | Includes bibliographical
references and index.
Identifiers: LCCN 2020015364 (print) | LCCN 2020015365 (ebook) | ISBN
9781119523079 (cloth) | ISBN 9781119523130 (adobe pdf) | ISBN
9781119523147 (epub)
Subjects: LCSH: Dependence (Statistics)
Classification: LCC QA273.18 .D57 2020 (print) | LCC QA273.18 (ebook) |
DDC 519.5–dc23
LC record available at https://lccn.loc.gov/2020015364
LC ebook record available at https://lccn.loc.gov/2020015365
Cover design by Wiley
Cover image: © zhengshun tang/Getty Images
To Anna and Linus
—W.W
To my wife Shu‐Min and my son Minjun
—D.K
To my wife Lamia Sungur
—E.S
To Donata, the origin of direction
—A.vE
Wolfgang Wiedermann
Wolfgang Wiedermann is Associate Professor at the University of Missouri, Columbia. He received his PhD in Quantitative Psychology from the University of Klagenfurt, Austria in 2012. His primary research interests include the development of methods for causal inference, methods to determine the causal direction of dependence in observational data, and methods for person‐oriented research settings. He has edited books on advances in statistical methods for causal inference (with von Eye, Wiley) and new developments in statistical methods for dependent data analysis in the social and behavioral sciences (with Stemmler and von Eye). His work appears in leading quantitative methods journals, including Psychological Methods, Multivariate Behavioral Research, Behavior Research Methods, and the British Journal of Mathematical and Statistical Psychology. He currently serves as an associate editor for Behaviormetrika and the Journal for Person‐Oriented Research.
Daeyoung Kim
Daeyoung Kim is Associate Professor of Mathematics and Statistics at the University of Massachusetts, Amherst. He received his PhD from the Pennsylvania State University in Statistics in 2008. His original research interests were in likelihood inference in finite mixture modeling including empirical identifiability and multimodality, development of geometric and computational methods to delineate multidimensional inference functions, and likelihood inference in incompletely observed categorical data, followed by a focus on the analysis of asymmetric association in multivariate data using (sub)copula regression. He also has active collaborations with colleagues in food sciences at the University of Massachusetts, Amherst, focusing on the use of statistical models to analyze data for colon cancer, obesity and diabetes.
Engin A. Sungur
Engin A. Sungur has a BA in City and Regional Planning (Middle East Technical University, METU, Turkey), MS in Applied Statistics, METU, M.S. in Statistics (Carnegie‐Mellon University, CMU) and PhD in Statistics (CMU). He taught at Carnegie‐Mellon University, University of Pittsburg, Middle East Technical University, and University of Iowa. Currently, he is a Morse‐Alumni distinguished professor of statistics at University of Minnesota, Morris. He has been teaching statistics for more than 38 years, 29 years of which at the University of Minnesota, Morris. His research areas are dependence modeling with emphasis on directional dependence, modern multivariate statistics, extreme value theory, and statistical education.
Alexander von Eye
Alexander von Eye, PhD, is Professor Emeritus of Psychology at Michigan State University. He received his PhD in Psychology, with minors in Education and Psychiatry, from the University of Trier, Germany, in 1976. He is known for his work on statistical modeling, categorical data analysis, methods of analysis of direction dependence hypotheses, person‐oriented research, and human development. He authored, among others, texts on Configural Frequency Analysis, the analysis of rater agreement (with Mun), and on log‐linear modeling (with Mun; Wiley), and he edited, among others, two books on latent variables analysis (the first with Clogg, the second with Pugesek and Tomer), and one on Statistics and Causality (with Wiedermann; Wiley). His over 400 articles appeared in the premier journals of the field, including, for instance, Psychological Methods, Multivariate Behavioral Research, Child Development, the Journal of Person‐Oriented Research, the American Statistician, and the Journal of Applied Statistics.
Patrick Blöbaum
Patrick Blöbaum studied Cognitive Computer Science and Intelligent Systems at Bielefeld University (Germany) from 2009 to 2014 and received his PhD in Engineering (Machine Learning) from Osaka University (Japan) in 2019 with a research focus on causality. In addition to his PhD studies, he worked as an assistant researcher and machine learning engineer in Japan. In 2019, Patrick joined the newly founded causality team at the Amazon research and development center in Tübingen, Germany, which focuses on the development and application of novel causality algorithms.
G. Anne Bogat
G. Anne Bogat, PhD is a professor of clinical psychology at Michigan State University. Her research centers on intimate partner violence (IPV), including a focus on daily experiences of IPV among college students; how IPV during pregnancy affects women, children, and the mother–child relationship; and how bonding between mothers and infants is affected by pregnancy and postpartum IPV. In addition, she has written about and employed person‐oriented methods in her research.
Yadolah Dodge
Yadolah Dodge was born in Abadan, Iran, and is a Swiss citizen. Along with a full‐time position as Professor and Chair of Statistics at the University of Neuchâtel, Switzerland, his dedication to photography, painting, and film‐making continued, resulting in three long documentaries: Turicum: This is Zurich (2014), Dear Son (2018), and Moving Heart (2019). He is author, co‐author, and editor of over 20 books by Oxford University Press, Springer and John‐Wiley, Dunod and North‐Holland and several papers.
Regina García‐Velázquez
Dr. Regina García‐Velázquez has a background in Psychology and received her master's degree in Methodology of Behavioural and Health Sciences from the Autonomous University of Madrid. She received her PhD from the University of Helsinki. She is a post‐doctoral researcher at the University of Helsinki and teaches courses on Psychometrics. She is interested in measurement issues applied to psychopathology, particularly on classification, validity, and statistical modelling. In her current research, she focuses on internalizing disorders.
Jade E. Kobayashi
Jade E. Kobayashi, MA is a clinical psychology graduate student in the doctoral program at Michigan State University. Her research interests include adult romantic attachment, interpersonal conflict and intimate partner violence (IPV), and intensive longitudinal and dyadic analytic methods.
Alytia A. Levendosky
Alytia A. Levendosky, PhD is a Professor in the Department of Psychology at Michigan State University. Her research over the past 25 years has focused on the effects of intimate partner violence (IPV) on mothers and children. Currently, her primary research interests are in the role of IPV as a stressor during the perinatal period. Her work has helped elucidate how IPV during pregnancy affects the very beginnings of motherhood through women's developing representations/schemas about their unborn child which later affect their parenting behaviors during early childhood.
Xintong Li
Xintong Li is Senior Research Analyst at the Assessment Resource Center at the University of Missouri, and he is an experienced researcher specialized in quantitative methods and educational research. He received his PhD in statistics, measurement and evaluation in education at the University of Missouri. His major research interests include causal inference with non‐experimental data, educator effectiveness, and motivation in education. He is skilled and experienced in advance statistical modeling, programming, large‐scale simulations, and large database management. He has multiple publications on methodological foundations and applications of direction dependence principals published in, e.g. Multivariate Behavioral Research, Behavior Research Methods, and Prevention Science.
Joel T. Nigg
Joel T. Nigg, PhD is a clinical psychologist and professor of Psychiatry and Behavioral Neuroscience and Director of the Center for ADHD Research at Oregon Health & Science University. His research on ADHD and related conditions has been funded by NIH continuously for over 20 years. His work focuses on refining the phenotype related to cognition and emotion and examining environmental and genetic etiology.
Tom Rosenström
Dr. Tom Rosenström has obtained his education in psychology (MA, PhD) and applied mathematics (MSc) at the University of Helsinki, Finland. He conducts mental health research, applying and developing mathematical and statistical models within the field. In addition, he has worked on theoretical biology at the University of Bristol and on behavior genetics at the Norwegian Institute of Public Health, and is currently employed by the Helsinki University Hospital, where he also conducts clinical patient work.
Valentin Rousson
Valentin Rousson was born in 1967 in Neuchâtel, Switzerland, where he got a PhD in Statistics in 1998. He then spent some time at the Australian National University in Canberra as a postdoc, and at the University of Zurich, sharing his time between statistical consulting and research. In 2007, he was named Associate Professor in Biostatistics at the University of Lausanne, where he is currently working and teaching.
Shohei Shimizu
Shohei Shimizu is a Professor at the Faculty of Data Science, Shiga University, Japan and leads the Causal Inference Team, RIKEN Center for Advanced Intelligence Project. He received a PhD in Engineering from Osaka University in 2006. His research interests include statistical methodologies for learning data generating processes such as structural equation modeling and independent component analysis and their application to causal inference. He received the Hayashi Chikio Award (Excellence Award) from the Behaviormetric Society in 2016. He is a coordinating editor of Behaviormetrika since 2016 and is an associate editor of Neurocomputing since 2019.
Diane D. Stadler
Diane Stadler has a PhD in Human Nutrition and is a registered dietitian with expertise in maternal and infant nutrition and providing care for children with metabolic disorders and developmental disabilities. She directs the Graduate Programs in Human Nutrition at Oregon Health & Science University in Portland, Oregon and is a leader in OHSU's nutrition education initiatives and research mentoring programs. She also oversees OHSU's clinical nutrition specialist training program and research initiatives in Lao People's Democratic Republic to support health care providers in addressing the country's high rates of childhood malnutrition.
Santi Tasena
Since 2011, Santi Tasena has been working at Chiang Mai University, Thailand. Being in love with mathematics, he enjoys discussing any topic related to mathematics. His research interests include mathematical analysis and related fields. His work includes heat kernel analysis on metric spaces, (sub)copulas and measures of dependence, and construction of aggregation and related functions. He received grants from the Commission on Higher Education, Thailand, the Centre of Excellence in Mathematics (CHE), Thailand, the Data Science Research Center, and the Center of Excellence in Mathematics and Applied Mathematics, Chiang Mai University, Thailand.
Tonghui Wang
Tonghui Wang is currently a full professor of statistics in the Department of Mathematical Sciences, New Mexico State University. He received his PhD degree from the University of Windsor, Canada in May, 1993. His research interests are multivariate linear modes under skew normal settings; copulas and their associated measures with applications; and big data analysis and statistical learning with applications.
Zheng Wei
Zheng Wei is currently an assistant professor of statistics in the Department of Mathematics and Statistics at the University of Maine. He served as a visiting assistant professor at Department of Mathematics and Statistics, University of Massachusettes Amherst from May 2015 to August 2017. He developed research in Bayesian statistical methods for data science, big data and analytics, the copula theory and its applications. He completed the PhD at the New Mexico State University in May 2015.
Phillip K. Wood
Phil K. Wood is a professor of Quantitative Psychology at the University of Missouri. He specializes in structural equation modeling, growth curve modeling and factor analysis, with particular emphasis on techniques for the analysis of longitudinally intensive data such as dynamic factor models. His substantive areas of interest include the cognitive outcomes of higher education and longitudinal inter‐individual differences in behaviors during young adulthood such as problematic alcohol use, tobacco and other drug usage and risky sexual behaviors.
Xiaonan Zhu
Xiaonan Zhu is an assistant professor in the Department of Mathematics at University of North Alabama. Before joining UNA in Fall 2019, he obtained his PhD and MS in Mathematical Statistics from the Department of Mathematical Sciences at New Mexico State University in 2019 and 2014, respectively. His research interests include sampling distributions of skew normal distributions, distribution of quadratic forms under closed skew normal settings, construction of copulas, (local) dependence of random vectors, measures of dependence through (sub‐)copulas.
There are numerous people to thank in regard of the preparation of this volume. First and foremost, we offer our deepest thanks to our contributing authors with whom we share a dedication to the development and application of statistical methods in the context of direction of dependence and causality. This volume would not have been possible without their excellent work.
We are also grateful to Wiley publishers for their interest in the topic and their support. This applies in particular to Sari Friedman, Kathleen Santoloci, Mindy Okura‐Marszycki, Elisha Benjamin, Sechin Nithya, Amudhapriya Sivamurthy, and Ezhilan Vikraman who have supported and guided us from the very first contact to the completion of this book. Thank you all!
Most important, we are grateful for the love and support of our respective families. The first editor wants to emphasize that he is grateful to be allowed to experience a causal mechanism that does not require any empirical evaluation – the dependence between W's happiness and the existence of Anna and Linus. No statistical modeling is needed to show that {Anna, Linus} → (W = happy) holds in an unconfounded manner. The second editor would like to express gratitude and sincere thanks to his wife, Shu‐Min, and his son, Minjun, for their tremendous support and love.
Questions concerning causation are omnipresent in the empirical sciences. In non‐experimental research, however, it is often hard to determine the status of variables as cause and effect. Temporal order alone is of limited use, unless one observes antecedents and the beginning of a chain of events. That is, even when a putative explanatory variable (x) is measured earlier in time than the (putative) outcome (y), one cannot rule out that an outcome, measured at an earlier point in time, may have caused x. Similarly, temporality alone does not prevent causal effect estimates from being biased unless one is able to adjust for all relevant (potentially time‐varying) confounders (Bellemare, Masaki, & Pepinsky, 2017). Cross‐sectional research has often been looked‐down upon because it is deemed of little use for the analysis of hypotheses that are compatible with (possibly competing) theories of causality. Based on cross‐sectional data alone, for example, one is not able to distinguish whether a relation between x and y is observed because of an underlying causal model of the form x → y (i.e. x causes y), the reverse‐causal model y → x (y causes x), or whether the observed relation is spurious due to (total or partial) confounding, x ← u → y.
Limitations of longitudinal and cross‐sectional observational research are (partly) rooted in the limitations of the statistical methods that are routinely applied to analyze dependence structures. In both research designs, covariance‐based methods (such as correlational, linear regression, and structural equation modeling techniques) are de rigueur. Although, these methods can be useful in the estimation of the magnitude of causal effects (provided that certain unconfoundedness conditions are fulfilled, see, e.g. Pearl, 2009), they do not help to empirically distinguish between cause and effect. For example, in the standardized case, linear regression parameters for the model x → y are identical to the ones that are estimated for the reverse regression, y → x (von Eye & DeShon, 2012). These symmetry properties of the linear regression model have been known since its early origins (Galton, 1886). In fact, the observation that regression is inherently symmetric was one of the reasons why Francis Galton (the “founding father” of linear regression) changed his characterization of the phenomenon that previously suppressed hereditary traits can re‐appear from a phenomenon of reversion to a phenomenon of regression (Gorroochurn, 2016). In other words, symmetry properties influenced how linear regression was conceptualized as a statistical tool. Similarly, symmetry properties of conventional representations of the Pearson product‐moment correlation (for an overview of various facets of the Pearson correlation see, for example, Rodgers and Nicewander (1988), Rovine and von Eye (1997), Falk and Well (1997), and Nelsen (1998)) certainly contributed to the widespread and well‐known mantra that correlation does not imply causation and to the belief that the means of statistic cannot be used to establish the causal direction of dependence.
Fortunately, this state of affairs has changed recently. It did take statisticians until the beginning of the new millennium to get a handle on the issue of direction dependence. But in 2000, Dodge and Rousson derived, within the framework of the linear regression model, the relation between cause and effect variables, for the (not so) particular case in which the cause variable is asymmetrically distributed. Specifically, these authors showed that variable information beyond means, variances, and covariances (e.g. skewness and co‐skewness) can be used to empirically determine which of two variables, is more likely to be the cause and which is more likely to be the effect. Focusing on asymmetry properties of the linear regression and the Pearson correlation, the work by Dodge and Rousson (2000) initiated a new topic and line of statistical research, that of the development and application of methods for the analysis of direction dependence and causal hypotheses. Dodge and Rousson (2000) focused on asymmetry that emerges from marginal variable distributions. Asymmetry properties based on error distributions have later been proposed by Wiedermann, Hagmann, and von Eye (2015), Wiedermann and von Eye (2015b), and Wiedermann and Hagmann (2016). Extensions to measurement error models were recently discussed in Wiedermann, Merkle, and von Eye (2018). The second seminal paper in this new line of research was published in 2005 by Engin A. Sungur (see Sungur (2005a); a discussion of copulas in the regression context is given by Sungur (2005b)). While Dodge and Rousson's (2000) initial work focused on determining the direction of dependence through studying the marginal behavior of distributions, Sungur (2005a) proposed to study the behavior of joint variable distributions by making use of copulas. This copula‐based direction dependence approach constitutes a second line of research that allows researchers to analyze cause‐effect properties of variables while accounting for potential differences in marginal distributions. Copula‐based directional dependence analysis has experienced rapid development. Various extension have been proposed by, e.g. Kim and Kim (2014, 2016), Wei and Kim (2017, 2018), and Kim and Hwang (2019) – more recent applications of the approach are given by Lee and Kim (2019) and Kim, Lee and Xiao (2019). The third seminal paper in the development of methods to distinguish between cause and effect variables was published by Shimizu and colleagues in 2006 proposing the linear non‐Gaussian acyclic model (LiNGAM) – a causal machine learning algorithm for non‐normal variables that is closely related to independent component analysis (Hyvärinen, Karhunen, & Oja, 2001). LiNGAM rapidly developed in the area of machine learning research and has been extended to nonlinear variable relations (Zhang & Hyvärinen, 2016), models with hidden common causes (Hoyer, Shimizu, Kerminen, & Palviainen, 2008; Shimizu & Bollen, 2014), and mixed (continuous and categorical) data (Yamayoshi, Tsuchida, & Yadohisa, 2020), to name a few. For an overview of recent advances in causal machine learning, see Guyon, Statnikov, and Batu (2019).
The present book is concerned with novel statistical approaches to the analysis of the causal direction of dependence of variables in both, exploratory (i.e. learning the causal structures from observational data without background knowledge) and confirmatory (i.e. testing a priori existing competing causal theories) research scenarios, and presents original work in four modules. In the first module, Fundamental Concepts of Direction Dependence, Dodge and Rousson (Chapter 1) introduce the well‐known Pearson correlation coefficient as an asymmetric concept of two variables which (as discussed above) served as a starting point for several lines of direction dependence research. Further, the authors provide a reminder that working with non‐normality of variables (as a key requirement to derive asymmetry properties in the linear case) bears challenges in practice (e.g. distinguishing between non‐normality as a characteristic of the construct under study versus non‐normality due to outliers and suboptimal measurement). In Chapter 2, Wiedermann, Li, and von Eye then continue the discussion of asymmetry properties of the linear regression model and introduce three asymmetry concepts (summarized in a framework termed Direction Dependence Analysis (DDA), cf. Wiedermann and von Eye, 2015a; Wiedermann & Li, 2018) that can be used to detect potential confounding and distinguish between the two causally competing models x → y and y → x. Applications of DDA in the context of mediation and moderation models are discussed. Chapter 3, by Engin A. Sungur, is devoted to the use of copulas in direction dependence modeling. This chapter introduces definitions and fundamental principles to model directional dependence of variables using asymmetric copulas and regression, and describes various copula‐based directional dependence measures to perform model selection in both, continuous and categorical data settings.
The second module is devoted to Direction Dependence in Continuous Variables. Chapter 4, by Wolfgang Wiedermann, discusses asymmetry properties of the partial correlation coefficient in the research tradition of Dodge and Rousson (2000). Asymmetric facets of the partial correlation coefficient are presented which enable one to test causally competing models while adjusting for relevant background variables. Parameter recovery and accuracy of model selection is evaluated using Monte‐Carlo simulation experiments. Chapter 5, by Shimizu and Blöbaum, gives an overview of recent advances in the development of algorithms for unsupervised causal learning. The authors start by introducing the standard LiNGAM and present extensions to structural vector autoregressive models for the analysis of time series data, models with hidden common causes, and methods for causal learning under nonlinearity of variable relations. In Chapter 6, Phillip K. Wood takes a regression diagnostic perspective and discusses the importance of evaluating the assumptions of the statistical models that are used to learn the causal structure of observational data. The author uses data from a longitudinal study on motives for alcohol consumption (cf. Sher & Rutledge, 2007) and compares the use of manifest variable composites, factor scores within a state‐trait model, and latent difference factor scores in the evaluation of directional dependence hypotheses. The last chapter of this module (Chapter 7) by Santi Tasena, reviews definitions and basic properties of measures of complete dependence. The author gives examples of calculating complete dependence measures in the case of the multivariate Gaussian distribution and presents open problems and potential future directions.
In the third module, methods of direction dependence are extended to the categorical variable domain. Chapter 8, by von Eye and Wiedermann, introduces an event‐based perspective in the analysis of hypotheses compatible with direction dependence. The authors introduce two‐valued statement calculus to derive composite causality statements and use a design matrix approach to evaluate event‐based direction dependence hypotheses. Three methods are compared with respect to their capability to test direction of dependence in categorical data, log‐linear modeling, configural frequency analysis, and prediction analysis. Chapter 9 contributed by Zhu, Wei, and Wang, is devoted to a copula‐based approach to measure associations in contingency tables. The authors start with reviewing some recently developed measures for the analysis of asymmetric associations in two‐way or three‐way contingency tables. Then, they propose two new measures of complete dependence on three‐way contingency tables and present corresponding nonparametric estimators. Chapter 10, by Kim and Wei, investigates a subcopula‐based asymmetric association measure for the analysis of dependence structures in three‐way ordinal contingency tables. Their asymmetric measure utilizes sub‐copula regressions obtained under the hypothesized dependence relations.
The fourth module is then devoted to Applications and Software. In Chapter 11, Rosenström and Regina García‐Velázquez make use of LiNGAM in the context of psychiatric epidemiology. Specifically, the authors use distribution‐based indicators to test the causal direction of the association between sleeping problems and depressive symptoms using data from the Swedish Adoption/Twin Study on Aging (Pedersen, 2005). In addition, the authors provide application guidelines for epidemiologists, present a novel Monte‐Carlo‐based sensitivity analysis approach to evaluate the robustness of LiNGAM results, and integrate distribution‐based causality approaches in the process of causal triangulation in etiologic epidemiology. Chapter 12, by Nigg, Stadler, von Eye, and Wiedermann, provides an application of direction dependence analysis in the context of determining risk factors of attention‐deficit/hyperactivity disorder (ADHD). Specifically, direction dependence methods for linear models are used to evaluate the causal structure of the association between breastfeeding duration and ADHD. The authors use one of the largest well‐characterized samples currently available and demonstrate DDA results can be affected by rater effects when measuring ADHD. Further an attempt is presented to account for potential ceiling/floor effects that can artificially increase the magnitude of non‐normality of variables. In Chapter 13, Bogat, Levendosky, Kobayashi, and von Eye then take a longitudinal data perspective in the discussion of causal effect directionality. The authors use daily diary data to assess longitudinal dynamics of the causal structure of intimate partner violence and mood lability in young adult couples. Granger causality models (a causal prediction approach in which one tests whether the inclusion of past information of one variable (e.g. xt–1) is useful in predicting another variable yt above and beyond the information that is contained in yt–1; Granger, 1969) are applied to test whether intimate partner violence is more likely to cause mood lability or vice versa. In the final chapter, by Li and Wiedermann (Chapter 14), a software implementation of direction dependence methods is presented. The authors introduce SPSS Custom Dialogs to perform DDA and use data from the High School Longitudinal Study 2009 (Ingels et al., 2011) for illustrative purposes. Specifically, the authors present a step‐by‐step tutorial to evaluate the causal direction of effect of academic achievement and intrinsic motivation in 9th grade Asian students.
Within the last two decades, tremendous progress has been made in the area of direction dependence modeling. We believe that this volume makes a timely and important contribution to the ongoing development of methods of direction dependence and we hope that this contribution will advance the statistical tools empirical sciences can use to better explain causal phenomena.
Wolfgang Wiedermann, University of Missouri, Columbia
Daeyoung Kim, University of Massachusetts, Amherst
Engin A. Sungur, University of Minnesota, Morris
Alexander von Eye, Michigan State University, East Lansing
Bellemare, M. F., Masaki, T., & Pepinsky, T. B. (2017). Lagged explanatory variables and the estimation of causal effect.
Journal of Politics
,
79
, 949–963. doi:10.2139/ssrn.2568724
Dodge, Y., & Rousson, V. (2000). Direction dependence in a regression line.
Communications in Statistics‐Theory and Methods
,
29
(9–10), 1957–1972. doi:10.1080/03610920008832589
Falk, R., & Well, A. D. (1997), “Many faces of the correlation coefficient,”
Journal of Statistics Education
, 5. Retrieved from http://www.amstat.org/publications/jse/v5n3/falk.html.
Galton, F. (1886). Family likeness in stature.
Proceedings of the Royal Society of London
,
40
(242–245), 42–73. doi:10.1098/rspl.1886.0009
Gorroochurn, P. (2016). On Galton's change from “reversion” to “regression”.
The American Statistician
,
70
(3), 227–231. doi:10.1080/00031305.2015.1087876
Granger, C. W. J. (1969). Investigating causal relations by econometric models and cross‐spectral methods.
Econometrica
,
37
(3), 424–438. doi:10.2307/1912791
Guyon, I., Statnikov, A., & Batu, B. B. (Eds.) (2019).
Cause effect pairs in machine learning
. doi:10.1007/978‐3‐030‐21810‐2
Hoyer, P. O., Shimizu, S., Kerminen, A. J., & Palviainen, M. (2008). Estimation of causal effects using linear non‐Gaussian causal models with hidden variables.
International Journal of Approximate Reasoning
,
49
(2), 362–378. doi:10.1016/j.ijar.2008.02.006
Hyvärinen, A., Karhunen, J., & Oja, E. (2001).
Independent component analysis
. New York, NY: Wiley & Sons.
Ingels, S. J., Pratt, D. J., Herget, D. R., Burns, L. J., Dever, J. A., Ottem, R., … LoGerfo, L. (2011).
High School Longitudinal Study of 2009 (HSLS: 09): Base‐year data file documentation
. Washington, DC: U.S. Dept. of Education, Institute of Education Sciences, National Center for Education Statistics.
Kim, D., & Kim, J.‐M. (2014). Analysis of directional dependence using asymmetric copula‐based regression models.
Journal of Statistical Computation and Simulation
,
84
(9), 1990–2010. doi:10.1080/00949655.2013.779696
Kim, S., & Kim, D. (2016).
Directional dependence analysis using skew‐normal copula‐based regression
. In W. Wiedermann & A. von Eye (Eds.),
Statistics and causality: Methods for applied empirical research
(pp. 131–152). Hoboken, NJ: Wiley and Sons.
Kim, J.‐M., & Hwang, S. Y. (2019). The copula directional dependence by stochastic volatility models.
Communications in Statistics ‐ Simulation and Computation
,
48
(4), 1153–1175. doi:10.1080/03610918.2017.1406512
Lee, N., & Kim, J.‐M. (2019). Copula directional dependence for inference and statistical analysis of whole‐brain connectivity from fMRI data.
Brain and Behavior
,
9
(1), e01191. doi:10.1002/brb3.1191
Nelsen, R. B. (1998). Correlation, regression lines, and moments of intertia.
American Statistician
,
52
, 343–345.
Pearl, J. (2009).
Causality: Models, reasoning, and inference
(2nd ed.). New York, NY: Cambridge University Press.
Pedersen, N. L. (2005). Swedish Adoption/Twin Study on Aging (SATSA), 1984, 1987, 1990, 1993, 2004, 2007, and 2010 [Data set]. doi:10.3886/ICPSR03843.v2
Rodgers, J. L., & Nicewander, W. A. (1988). Thirteen ways to look at the correlation coefficient.
American Statistician
,
42
, 59–66.
Rovine, M. J., & von Eye, A. (1997). A 14th way to look at a correlation coefficient: Correlation as the proportion of matches.
American Statistician
,
51
, 42–46.
Sher, K. J., & Rutledge, P. C. (2007). Heavy drinking across the transition to college: Predicting first‐semester heavy drinking from precollege variables.
Addictive Behaviors
,
32
, 819–835.
Shimizu, S., & Bollen, K. A. (2014). Bayesian estimation of causal direction in acyclic structural equation models with individual‐specific confounder variables and non‐Gaussian distributions.
Journal of Machine Learning Research
,
15
, 2629–2652.
Shimizu, S., Hoyer, P. O., Hyvärinen, A., & Kerminen, A. (2006). A linear non‐Gaussian acyclic model for causal discovery.
The Journal of Machine Learning Research
,
7
, 2003–2030.
a Sungur, E. A. (2005a). A note on directional dependence in regression setting.
Communications in Statistics ‐ Theory and Methods
,
34
(9–10), 1957–1965. doi:10.1080/03610920500201228
b Sungur, E. A. (2005b). Some observations on copula regression functions.
Communications in Statistics ‐ Theory and Methods
,
34
(9–10), 1967–1978. doi:10.1080/03610920500201244
von Eye, A., & DeShon, R. P. (2012). Directional dependence in developmental research.
International Journal of Behavioral Development
,
36
(4), 303–312. doi:10.1177/0165025412439968
Wei, Z., & Kim, D. (2017). Subcopula‐based measure of asymmetric association for contingency tables.
Statistics in Medicine
,
36
, 3875–3894. doi:10.1002/sim.7399
Wei, Z., & Kim, D. (2018). On multivariate asymmetric dependence using multivariate skew‐normal copula‐based regression.
International Journal of Approximate Reasoning
,
92
, 376–391. doi:10.1016/j.ijar.2017.10.016
Wiedermann, W., & Hagmann, M. (2016). Asymmetric properties of the Pearson correlation coefficient: Correlation as the negative association between linear regression residuals.
Communications in Statistics: Theory and Methods
,
45
(21), 6263–6283. doi:10.1080/03610926.2014.960582
Wiedermann, W., & Li, X. (2018). Direction dependence analysis: A framework to test the direction of effects in linear models with an implementation in SPSS.
Behavior Research Methods
,
50
(4), 1581–1601. doi:10.3758/s13428‐018‐1031‐x
a Wiedermann, W., & von Eye, A. (2015a). Direction‐dependence analysis: A confirmatory approach for testing directional theories.
International Journal of Behavioral Development
,
39
(6), 570–580. doi:10.1177/0165025415582056
b Wiedermann, W., & von Eye, A. (2015b). Direction of effects in multiple linear regression models.
Multivariate Behavioral Research
,
50
, 23–40.
Wiedermann, W., Hagmann, M., & von Eye, A. (2015). Significance tests to determine the direction of effects in linear regression models.
British Journal of Mathematical and Statistical Psychology
,
68
, 116–141.
Wiedermann, W., Merkle, E. C., & von Eye, A. (2018). Direction of dependence in measurement error models.
British Journal of Mathematical and Statistical Psychology
,
71
, 117–145.
Yamayoshi, M., Tsuchida, J., & Yadohisa, H. (2020). An estimation of causal structure based on latent LiNGAM for mixed data.
Behaviormetrika
,
47
(1), 105–121. doi:10.1007/s41237‐019‐00095‐3
Zhang, K., & Hyvärinen, A. (2016). Nonlinear functional causal models for distinguishing cause from effect. In W. Wiedermann & A. von Eye (Eds.),
Wiley series in probability and statistics
(pp. 185–201). doi:10.1002/9781118947074.ch8
Yadolah Dodge1and Valentin Rousson2
Institute of Statistics, University of Neuchâtel, Neuchâtel, Switzerland
Division of Biostatistics, Center for Primary Care and Public Health (Unisanté), University of Lausanne, Lausanne, Switzerland
The Pearson product‐moment correlation coefficient is one of the most popular statistical measure to summarize an association between two (continuous) variables X and Y. As suggested by Rodgers and Nicewander (1988), it should actually be renamed the “Galton–Pearson” correlation coefficient since both men played a significant role in the development and promotion of this coefficient in statistics. The concept of correlation was introduced by Francis Galton in 1888 (Blyth, 1994; Galton, 1888), although it was already presented in 1885 in relation to regression (Galton, 1885; Rodgers & Nicewander, 1988), while Karl Pearson (1895) provided the mathematical formula. See e.g. Stigler (1989) for some detailed historical account. Although Pearson (1930) quoted in Aldrich (1995) wrote that “up to 1889 men of sciences had thought only in terms of causation,” it was clear from the very beginning that “correlation does not imply causation.” For example, Aldrich (1995) mentioned that Francis Galton (1888) was well aware that “the correlation between two variables measures the extent to which they are governed by common causes.” Thus, establishing a correlation between X and Y does not imply that one variable is the cause and the other is the (direct or indirect) consequence, just that the two variables are associated, due perhaps to the existence of a third variable Z which would be a common cause of both X and Y. In fact, even if one could rule out completely the possibility of the existence of such a variable Z, there would be no way to conclude from a correlation which of X and Y is the cause and which is the consequence, since the formula provided by Karl Pearson is perfectly (and beautifully) symmetric in X and Y.
Given a sample of nobservations (Xi, Yi) (i = 1, …, n) from a bivariate variable (X, Y), the (Pearson product‐moment) correlation (coefficient) can be calculated as:
where and denote the sample means of X and Y. This is also the covariance between X and Y divided by the product of their standard deviations. Obviously, one has rXY = rYX. As mentioned in Section 1.1, correlation is intimately related to regression. Let us consider the regression equation with Y as the response variable and X as the predictor:
as well as the regression equation with X as the response variable and Y as the predictor:
If the goal is to get residuals ɛi and with zero mean and with the smallest possible variances, the regression coefficients are obtained via the least squares criterion, which for the slopes are given by:
and by:
Thus, the correlation is also the geometrical mean of the slopes in Eqs. (1.2) and (1.3):
Again, this is a symmetrical formula in X and Y. Many other ways to calculate or to interpret a correlation have been provided in the statistical literature. In particular, Rodgers and Nicewander (1988) identified 13 ways to look at the correlation, whereas a 14th way has been added to the list by Rovine and von Eye (1997), and even more by Falk and Well (1997). However, all these formulas, when involving two continuous variables, are symmetrical in X and Y.
When one considers a linear regression model (1.2) or (1.3), one usually assumes residuals which are normally distributed and with the same variance (homoscedasticity), yielding independence between the predictor and the residual variable (the residual distribution is the same whatever the value of predictor), which is also what is assumed in what follows. In that case, models (1.2) and (1.3) cannot hold simultaneously unless the distribution of (X, Y) is bivariate normal. In particular, if one considers that both X and Y are non‐normal, at most one of (1.2) and (1.3) may hold. It is in such a context of non‐normal X and Y that Dodge and Rousson (2000, 2001) introduced further formulas to interpret a correlation. Under model (1.2), and using basic properties of cumulants (see e.g. Kendall & Stuart, 1963), which differ from zero in case of non‐normal variables, they noted that:
where cumulantm(V) denotes the mth (standardized) cumulant of a random variable V, where m ≥ 3, yielding the skewness coefficient for m = 3. One has thus for example:
