100,99 €
Geostatistical Functional Data Analysis
Explore the intersection between geostatistics and functional data analysis with this insightful new reference
Geostatistical Functional Data Analysis presents a unified approach to modelling functional data when spatial and spatio-temporal correlations are present. The Editors link together the wide research areas of geostatistics and functional data analysis to provide the reader with a new area called geostatistical functional data analysis that will bring new insights and new open questions to researchers coming from both scientific fields. This book provides a complete and up-to-date account to deal with functional data that is spatially correlated, but also includes the most innovative developments in different open avenues in this field.
Containing contributions from leading experts in the field, this practical guide provides readers with the necessary tools to employ and adapt classic statistical techniques to handle spatial regression. The book also includes:
Aimed at mathematicians, statisticians, postgraduate students, and researchers involved in the analysis of functional and spatial data, Geostatistical Functional Data Analysis will also prove to be a powerful addition to the libraries of geoscientists, environmental scientists, and economists seeking insightful new knowledge and questions at the interface of geostatistics and functional data analysis.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 658
Veröffentlichungsjahr: 2021
Cover
Title Page
Copyright
List of Contributors
Foreword
1 Introduction to Geostatistical Functional Data Analysis
1.1 Spatial Statistics
1.2 Spatial Geostatistics
1.3 Spatiotemporal Geostatistics
1.4 Functional Data Analysis in Brief
References
Part I: Mathematical and Statistical Foundations
2 Mathematical Foundations of Functional Kriging in Hilbert Spaces and Riemannian Manifolds
2.1 Introduction
2.2 Definitions and Assumptions
2.3 Kriging Prediction in Hilbert Space: A Trace Approach
2.4 An Operatorial Viewpoint to Kriging
2.5 Kriging for Manifold-Valued Random Fields
2.6 Conclusion and Further Research
References
3 Universal, Residual, and External Drift Functional Kriging
3.1 Introduction
3.2 Universal Kriging for Functional Data (UKFD)
3.3 Residual Kriging for Functional Data (ResKFD)
3.4 Functional Kriging with External Drift (FKED)
3.5 Accounting for Spatial Dependence in Drift Estimation
3.6 Uncertainty Evaluation
3.7 Implementation Details in R
3.8 Conclusions
References
4 Extending Functional Kriging When Data Are Multivariate Curves: Some Technical Considerations and Operational Solutions
4.1 Introduction
4.2 Principal Component Analysis for Curves
4.3 Functional Kriging in a Nutshell
4.4 An Example with the Precipitation Observations
4.5 Functional Principal Component Kriging
4.6 Multivariate Kriging with Functional Data
4.7 Discussion
4.A Appendices
References
5 Geostatistical Analysis in Bayes Spaces: Probability Densities and Compositional Data
5.1 Introduction and Motivations
5.2 Bayes Hilbert Spaces: Natural Spaces for Functional Compositions
5.3 A Motivating Case Study: Particle-Size Data in Heterogeneous Aquifers – Data Description
5.4 Kriging Stationary Functional Compositions
5.5 Analyzing Nonstationary Fields of FCs
5.6 Conclusions and Perspectives
References
6 Spatial Functional Data Analysis for Probability Density Functions: Compositional Functional Data vs. Distributional Data Approach
6.1 FDA and SDA When Data Are Densities
6.2 Measures of Spatial Association for Georeferenced Density Functions
6.3 Real Data Analysis
6.4 Conclusion
Acknowledgments
References
Notes
Part II: Statistical Techniques for Spatially Correlated Functional Data
7 Clustering Spatial Functional Data
7.1 Introduction
7.2 Model-Based Clustering for Spatial Functional Data
7.3 Descendant Hierarchical Classification (HC) Based on Centrality Methods
7.4 Application
7.5 Conclusion
References
8 Nonparametric Statistical Analysis of Spatially Distributed Functional Data
8.1 Introduction
8.2 Large Sample Properties
8.3 Prediction
8.4 Numerical Results
8.5 Conclusion
8 Appendix
References
9 A Nonparametric Algorithm for Spatially Dependent Functional Data: Bagging Voronoi for Clustering, Dimensional Reduction, and Regression
9.1 Introduction
9.2 The Motivating Application
9.3 The Bagging Voronoi Strategy
9.4 Bagging Voronoi Clustering (BVClu)
9.5 Bagging Voronoi Dimensional Reduction (BVDim)
9.6 Bagging Voronoi Regression (BVReg)
9.7 Conclusions and Discussion
References
Note
10 Nonparametric Inference for Spatiotemporal Data Based on Local Null Hypothesis Testing for Functional Data
10.1 Introduction
10.2 Methodology
10.3 Data Analysis
10.4 Conclusion and Future Works
References
11 Modeling Spatially Dependent Functional Data by Spatial Regression with Differential Regularization
11.1 Introduction
11.2 Spatial Regression with Differential Regularization for Geostatistical Functional Data
11.3 Simulation Studies
11.4 An Illustrative Example: Study of the Waste Production in Venice Province
11.5 Model Extensions
References
Notes
12 Quasi-maximum Likelihood Estimators for Functional Linear Spatial Autoregressive Models
12.1 Introduction
12.2 Model
12.3 Results and Assumptions
12.4 Numerical Experiments
12.5 Conclusion
12.A Appendix
References
13 Spatial Prediction and Optimal Sampling for Multivariate Functional Random Fields
13.1 Background
13.2 Functional Kriging
13.3 Functional Cokriging
13.4 Optimal Sampling Designs for Spatial Prediction of Functional Data
13.5 Real Data Analysis
13.6 Discussion and Conclusions
References
Part III: Spatio–Temporal Functional Data
14 Spatio–temporal Functional Data Analysis
14.1 Introduction
14.2 Randomness Test
14.3 Change-Point Test
14.4 Separability Tests
14.5 Trend Tests
14.6 Spatio–Temporal Extremes
References
15 A Comparison of Spatiotemporal and Functional Kriging Approaches
15.1 Introduction
15.2 Preliminaries
15.3 Kriging
15.4 A Simulation Study
15.5 Application: Spatial Prediction of Temperature Curves in the Maritime Provinces of Canada
15.6 Concluding Remarks
References
16 From Spatiotemporal Smoothing to Functional Spatial Regression: a Penalized Approach
16.1 Introduction
16.2 Smoothing Spatial Data via Penalized Regression
16.3 Penalized Smooth Mixed Models
16.4 P-spline Smooth ANOVA Models for Spatial and Spatiotemporal data
16.5 P-spline Functional Spatial Regression
16.6 Application to Air Pollution Data
Acknowledgments
References
Index
End User License Agreement
Chapter 3
Table 3.1 Performance indexes over the 10 validation sites.
Chapter 8
Table 8.1 Simulation results for
according to the models
and
, the cases...
Table 8.2 Simulation results for
according to the models
and
with cases...
Chapter 9
Table 9.1 DUSAF data: soil use categories and corresponding explanations
Chapter 12
Table 12.1 Estimation of parameters with
,
.
Table 12.2 Estimation of parameters with
,
.
Table 12.3 Estimation of parameters associated with scenario 2 with
.
Table 12.4 Estimation of parameters associated with scenario 2 with
.
Table 12.5 Estimation of parameters associated with scenario 2 with
.
Table 12.6 Estimation of parameters associated with scenario 2 with
.
Table 12.7 Estimated parameters for FLM and functional spatial autoregressiv...
Chapter 13
Table 13.1 Nested variogram components of the linear model of coregionalizat...
Chapter 14
Table 14.1 Randomness test results applied to the Russian weather data.
Table 14.2 Randomness test results applied to a subset of 14 Russian weather...
Table 14.3
-Values for each change-point test applied to the Russian weather dat...
Table 14.4
-Values for each change-point test applied to a subset of 14 Russian ...
Table 14.5
-Values for norm-based separability test (
) applied to the Russian w...
Table 14.6
-values for norm-based separability test (
) applied to a subset of 1...
Chapter 15
Table 15.1 The 24 different types (cases) of simulated Gaussian processes an...
Table 15.2 Prediction performance in terms of MSPEs for the simulated cases ...
Table 15.3 Prediction performance in terms of MSPEs for the simulated cases ...
Table 15.4 Prediction performance of different Sp.T. kriging models for the ...
Chapter 2
Figure 2.1 Spatially dependent curves simulated from the fields
(a) and
,...
Figure 2.2 Empirical trace-variograms in
(a) and
(b).
Figure 2.3 Canada's Maritime Provinces Temperatures dataset, year 1980. (a) ...
Figure 2.4 Estimated trace-semivariogram from the residuals (a) and estimate...
Figure 2.5 Universal kriging maps for the Summer Solstice (
June; a) and th...
Figure 2.6 Visual representation of the tangent space in
on a sphere and o...
Figure 2.7 (a) Empirical semivariogram (symbols) and fitted exponential mode...
Figure 2.8 Kriging of the (temperature, precipitation) covariance matrix fie...
Figure 2.9 (a) Empirical prediction error as a function of the sample margin...
Chapter 3
Figure 3.1 Locations of the 24
monitoring sites (light gray triangles) and...
Figure 3.2
raw data (in log scale) observed at the 24 monitoring sites....
Figure 3.3 Trace-variogram cloud and estimated trace-variogram.
Figure 3.4 Estimated functional coefficients assuming independent observatio...
Figure 3.5 Raw data (dots), smoothed data (dashed line), predicted drift (li...
Figure 3.6 Original
data (black dots), FKED predicted curve (dark gray lin...
Chapter 4
Figure 4.1 On a sampled domain
, two functional variables are observed some...
Figure 4.2 Map of France and climate dataset. On each point, annual curves (...
Figure 4.3 An example of computation of the spatial covariance. Experimental...
Figure 4.4 Predictions of precipitation curves (dashed line: mean precipitat...
Figure 4.5 LMC fitting of the variogram model on the four PCs of precipitati...
Figure 4.6 Boxplot of the errors when decreasing the number of principal com...
Figure 4.7 Empirical estimate of a
matrix of correlation operators (normal...
Figure 4.8 First factors of the MFPCA of temperature and precipitation profi...
Figure 4.9 Example of PCA 2D-mapping of observations
(a) accounting for
...
Figure 4.10 Empirical variograms and fitting of a coregionalization model on...
Figure 4.11 Predictions of temperature and precipitation curves (dashed line...
Figure 4.12 Correlation (absolute values) between PCs of a MFPCA realized fr...
Chapter 5
Figure 5.1 Example of perturbation and powering in
, compared to the typica...
Figure 5.2 Raw particle-size data at the Lauswiesen site. (a) Collection of ...
Figure 5.3 (a) Vertical distribution of smoothed densities; (b) raw particle...
Figure 5.4 Vertical distribution of ordinary kriging predictions results: (a...
Figure 5.5 Kriged field and conditional realizations. (a) Kriging estimation...
Figure 5.6 Field data: (a) smoothed PSDs; (b) soil types at the field site (...
Figure 5.7 (a) Estimated trace-semivariogram of the residuals; (b) estimated...
Figure 5.8 Class-kriging of PSDs: (a) results at boreholes B5, F4, and F6 an...
Chapter 6
Figure 6.1 ACS-5y 2015, Texas data: first five counties (of 254) of the inpu...
Figure 6.2 ACS-5y 2015, Texas data: counties with a significant local Moran'...
Figure 6.3 ACS-5y 2015, Texas data: Moran's plot for residual functions.
Figure 6.4 ACS-5y 2015, Texas data: cluster of counties with a significant l...
Figure 6.5 ACS-5y 2015, Texas data:
AGE
variable, first two harmonics after ...
Figure 6.6 ACS-5y 2015, Texas data:
AGE
variable. On top, the two maps with ...
Figure 6.7 ACS-5y 2015, Texas data:
INCOME
variable, first two harmonics aft...
Figure 6.8 ACS-5y 2015, Texas data:
INCOME
variable. On the top, the two map...
Figure 6.9 ACS-5y 2015, Texas data:
INCOME
variable transformed using a Box–...
Figure 6.10 ACS-5y 2015, Texas data:
INCOME
variable transformed using Box–C...
Chapter 7
Figure 7.1 Algorithm of the descendant HC.
Figure 7.2 Location of 106 monitoring stations (in the same number of cities...
Figure 7.3 Ozone concentration curves (obtained after smoothing the data by ...
Figure 7.4 Value of BIC criterion according to the number of clusters.
Figure 7.5 Locations of the stations are colored according to the cluster (a...
Figure 7.6 Average curves by cluster, respectively, for two clusters (a) and...
Figure 7.7 The classification results of the descendant HC.
Figure 7.8 Locations of the stations colored according to the cluster (a), m...
Figure 7.9 The curves of the different groups by the descendant HC.
Chapter 8
Figure 8.1 Some simulated curves of Case
(a) and Case
(b). In Case 1,
...
Figure 8.2 A simulated field considering Model
, Case
and
with (a) an i...
Figure 8.3 A simulated field considering Model
, Case
, and
with (a) an ...
Figure 8.4 A simulated field considering Model
, Case
, and
with (a) an ...
Figure 8.5 Boxplots of
,
and
, respectively, over the
replications of ...
Chapter 9
Figure 9.1 Map of the region around Milan (metropolitan area) covered by the...
Figure 9.2 The total Erlang data as a function of time. Continuous vertical ...
Figure 9.3 Average power spectrum
obtained via sitewise smoothing of the E...
Figure 9.4 Results of BVClu on the Telecom data. Average normalized entropy
Figure 9.5 Results of BVClu on the Telecom data, with
and
. (a) Map of th...
Figure 9.6 Results of BVDim on the Telecom data. Euclidean distance from the...
Figure 9.7 Results of BVDim on the Telecom data: the first six elements of t...
Figure 9.8 Results of BVDim on the Telecom data: maps of the estimated surfa...
Figure 9.9 Four of the
selected DUSAF covariates superimposed to the metro...
Figure 9.10 Results of BVReg on the Telecom data. Mean cross-validation erro...
Figure 9.11 Results of the BVReg lasso regression on the first four estimate...
Chapter 10
Figure 10.1 (a) Map of Canada with locations of the 35 weather stations. Smo...
Figure 10.2 FANOVA test on curves (a, b) and first derivatives (c, d) of Can...
Figure 10.3 Pairwise comparisons between curves. Diagonal panels: all temper...
Figure 10.4 Pairwise comparisons between first derivatives. Diagonal panels:...
Chapter 11
Figure 11.1 Spatial domain of the Venice waste data, with a line highlightin...
Figure 11.2 Temporal evolution of the yearly per capita production (kilogram...
Figure 11.3 Per capita production (kilogram per resident) of municipal waste...
Figure 11.4 Simplified boundary of the Venice province (a) and detail of the...
Figure 11.5 Triangulation of the Venice province.
Figure 11.6 Example of linear finite element basis function.
Figure 11.7 Simulation without covariates: test function (first row), sample...
Figure 11.8 Simulation with covariates: test function (first row), added con...
Figure 11.9 (a) Simulation without covariates: boxplots of the RMSE, over 50...
Figure 11.10 Estimated spatiotemporal field for the Venice waste data (yearl...
Figure 11.11 Temporal evolution of the estimated spatiotemporal field for th...
Chapter 12
Figure 12.1 Estimated parameter function
with the different criteria and
Figure 12.2 Estimated parameter function
with the different criteria and
Figure 12.3 Estimated parameter function
with the different criteria in Sc...
Figure 12.4 Locations and areas of the 106 stations (a) and corresponding oz...
Figure 12.5 The three first eigenfunctions (a) and the proportion of explain...
Figure 12.6 Estimated parameter functions.
Figure 12.7 Ozone concentration (solid curves) at 4 stations selected random...
Chapter 13
Figure 13.1 (a) México city. Air quality network RAMA (stations shown in lig...
Figure 13.2 Empirical and theoretical variograms fitted according to the lin...
Figure 13.3 (a) Optimal location for one additional station. (b) Cross-valid...
Chapter 14
Figure 14.1 Locations of the 220 Russian weather stations, with 14 stations ...
Figure 14.2 Daily temperature maxima for five weather stations during 2000....
Figure 14.3 Five consecutive years (2006–2010) of typhoon data. The dots rep...
Figure 14.4 Typhoons (a) and hurricanes (b) data in 2005 with expectile curv...
Figure 14.5 Gray lines represent ionosonde measurements obtained at observat...
Figure 14.6 Number of available stations in the mid-latitude northern hemisp...
Figure 14.7 A map of the neighborhood structures for different locations usi...
Figure 14.8 Probability of a heat wave with amplitude more than two standard...
Chapter 15
Figure 15.1 Examples of simulated data for: (a) case 3 (
,
), (b) case 7 (
Figure 15.2 Prediction performance (minimum MSPE over the three trace-semiva...
Figure 15.3 Box plots for cases 1–9 of the differences in (minimum) MSPE bet...
Figure 15.4 Prediction performance (minimum MSPE over the three trace-semiva...
Figure 15.5 The locations of the 36 weather stations in the Canadian Maritim...
Figure 15.6 (a) The empirical trace-semivariogram and the best fitted stable...
Figure 15.7 The empirical Sp.T. semivariogram (a) and the best-fitted Sp.T. ...
Figure 15.8 (A) Functional cross-validation residuals (gray lines) resulting...
Figure 15.9 Predicted temperatures at locations Bertrand (a) and Moncton (b)...
Chapter 16
Figure 16.1 Portion of the
-spline basis (tensor product of nine cubic spli...
Figure 16.2 Simulated functions: (a) and (b) are the nonlinear main effects ...
Figure 16.3
of fitted smooth model for
: scenario 1 (a–c), scenario 2 (d–...
Figure 16.4 Medians of daily ozone curves (from 2002 to 2015) observed at 55...
Figure 16.5 Smoothed spatial and temporal main effects for the ANOVA model. ...
Figure 16.6 Smoothed spatiotemporal interaction for ANOVA model at four sele...
Figure 16.7 Smoothed spatiotemporal fit for ANOVA model at four selected loc...
Figure 16.8 Regression splines fitted from the ozone raw data by using a cub...
Figure 16.9 Predicted curve from the regression splines of the ozone raw dat...
Figure 16.10 Predicted curves (gray) from the regression splines of the ozon...
Figure 16.11 Smoothed spatiotemporal fit for ANOVA model at four selected lo...
Cover Page
Table of Contents
Title Page
Copyright
List of Contributors
Foreword
Begin Reading
Index
End User License Agreement
ii
iii
iv
xiii
xiv
xv
xvi
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
27
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
155
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
351
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
Established by Walter A. Shewhart and Samuel S. Wilks
The Wiley Series in Probability and Statistics is well established and authoritative. It covers many topics of current research interest in both pure and applied statistics and probability theory. Written by leading statisticians and institutions, the titles span both state-of-the-art developments in the field and classical methods.
Reflecting the wide range of current research in statistics, the series encompasses applied, methodological and theoretical statistics, ranging from applications and new techniques made possible by advances in computerized practice to rigorous treatment of theoretical approaches. This series provides essential and invaluable reading for all statisticians, whether in academia, industry, government, or research.
A complete list of titles in this series can be found at http://www.wiley.com/go/wsps
Edited by
Jorge MateuUniversity Jaume I of Castellon Castellon, Spain
Ramón GiraldoNational University of Colombia Bogota, Colombia
This edition first published 2022© 2022 John Wiley & Sons Ltd
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.
The right of Jorge Mateu and Ramón Giraldo to be identified as the authors of the editorial material in this work has been asserted in accordance with law.
Registered OfficesJohn Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USAJohn Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK
Editorial Office9600 Garsington Road, Oxford, OX4 2DQ, UK
For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.
Wiley also publishes its books in a variety of electronic formats and by print-on-demand. Some content that appears in standard print versions of this book may not be available in other formats.
Limit of Liability/Disclaimer of WarrantyThe contents of this work are intended to further general scientific research, understanding, and discussion only and are not intended and should not be relied upon as recommending or promoting scientific method, diagnosis, or treatment by physicians for any particular patient. In view of ongoing research, equipment modifications, changes in governmental regulations, and the constant flow of information relating to the use of medicines, equipment, and devices, the reader is urged to review and evaluate the information provided in the package insert or instructions for each medicine, equipment, or device for, among other things, any changes in the instructions or indication of usage and for added warnings and precautions. While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
Library of Congress Cataloging-in-Publication Data
Names: Mateu, Jorge, editor. | Giraldo, Ramón, editor.
Title: Geostatistical functional data analysis / edited by Jorge Mateu, Ramón Giraldo.
Description: Hoboken, NJ : Wiley, 2022. | Series: Wiley series in probability and statistics | Includes bibliographical references and index.
Identifiers: LCCN 2021015788 (print) | LCCN 2021015789 (ebook) | ISBN 9781119387848 (hardback) | ISBN 9781119387909 (adobe pdf) | ISBN 9781119387886 (epub)
Subjects: LCSH: Geology–Statistical methods. | Kriging. | Spatial analysis (Statistics) | Functional analysis.
Classification: LCC QE33.2.S82 G434 2022 (print) | LCC QE33.2.S82 (ebook) | DDC 551.072/7–dc23
LC record available at https://lccn.loc.gov/2021015788
LC ebook record available at https://lccn.loc.gov/2021015789
Cover Design: Wiley
Cover Image: © Googee/Shutterstock
Ana M. AguileraUniversity of GranadaDepartment of Statistics and Operational ResearchSpain
Mohamed-Salem AhmedUniversity of LilleFrance
Mara S. BernardiPolitecnico di MilanoMOX - Department of MathematicsItaly
Gregory BoppPennsylvania State UniversityDepartment of StatisticsUSA
Martha BohorquezNational University of ColombiaDepartment of StatisticsColombia
Laurence BrozeUniversity of LilleFrance
María del Carmen Aguilera MorilloUniversitat Politècnica de ValènciaDepartment of Statistics and Operational Research and QualitySpain
Sophie Dabo-NiangUniversity of LilleFrance
Maria DurbanUniversidad Carlos IIIDepartment of StatisticsSpain
John EnsleyPennsylvania State UniversityDepartment of StatisticsUSA
Maria Franco-VilloriaUniversità di Modena e Reggio EmiliaDepartment of Economics “Marco Biagi”Italy
Zied GharbiUniversity of LilleFrance
Ramón GiraldoNational University of ColombiaDepartment of StatisticsColombia
Alberto GuadagniniPolitecnico di MilanoDepartment of Civil and Environmental EngineeringItaly
and
The University of ArizonaDepartment of Hydrology and Atmospheric SciencesUSA
Rosaria IgnaccoloUniversità degli Studi di TorinoDipartimento di Economia e Statistica “Cognetti de Martiis”Italy
Antonio IrpinoUniversity of Campania “Luigi Vanvitelli”Department of Mathematics and PhysicsItaly
Piotr KokoszkaColorado State UniversityDepartment of StatisticsUSA
Sara Sjöstedt de LunaUmeå UniversityDepartment of Mathematics and Mathematical StatisticsSweden
Dae-Jin LeeBCAM–Basque Center for Applied MathematicsSpain
Claude MantéUniversité du Sud Toulon-VarCNRS/INSU, IRD, MIO, Aix-Marseille UniversitéFrance
Jorge MateuUniversity Jaume I of CastellonDepartment of MathematicsSpain
Alessandra MenafoglioPolitecnico di MilanoMOX - Department of MathematicsItaly
Pascal MonestiezINRAE - Unité BioSPFrance
David NeriniUniversité du Sud Toulon-VarCNRS/INSU, IRD, MIO, Aix-Marseille UniversitéFrance
Federica PassamontiPolitecnico di MilanoMOX - Department of MathematicsItaly
Davide PigoliKing's College LondonUK
Alessia PiniUniversità Cattolica del Sacro CuoreDepartment of Statistical SciencesItaly
Cristian PredaInstitute of Statistics and Applied Mathematics of the Romanian AcademyRomania
Matthew ReimherrPennsylvania State UniversityDepartment of StatisticsUSA
Elvira RomanoUniversity of Campania “Luigi Vanvitelli”Department of Mathematics and PhysicsItaly
Laura M. SangalliPolitecnico di MilanoMOX - Department of MathematicsItaly
Piercesare SecchiPolitecnico di MilanoMOX - Department of MathematicsItaly
and
CADS - Center for Analysis Decisions and SocietyHuman TechnopoleItaly
Johan StrandbergUmeå UniversityDepartment of StatisticsSweden
Camille TernynckUniversity of LilleFrance
Baba ThiamUniversity of LilleFrance
Vincent VandewalleUniversity of LilleFrance
Simone VantiniPolitecnico di MilanoMOX - Department of MathematicsItaly
Valeria VitelliUniversity of OsloOslo Center for Biostatistics and EpidemiologyDepartment of BiostatisticsNorway
Anne-Françoise YaoUniversité Clermont-AuvergneFrance
Functional data analysis (FDA) is a branch of statistics that analyses data providing information about curves, surfaces, or anything else varying over a continuum. In its most general form, under an FDA framework each sample element is a function. The continuum over which these functions are defined is often time, but may also be spatial location, wavelength, probability, etc. In the 20 years since the first books and papers on this topic, this field of statistics has received the attention and encouragement of researchers in statistics and many applied disciplines and has become an important and dynamic area of modern statistics. Topics that have been covered include descriptive techniques, statistical inference, multivariate and nonparametric methods, regression, generalized linear models, time series, and spatial statistics.
Modern technology has made it possible to obtain large spatial and spatiotemporal data sets, and poses the challenge of statistical modeling of such data. The combination of spatial statistics with FDA has emerged as a key approach. This book presents new theories and methods to define, describe, characterize, and model functional data indexed in spatial or spatio-temporal domains. The main focus is on functional data obtained under a geostatistical framework, where the domain is fixed and continuous. Specific topics considered include kriging, clustering, regression, and optimal sampling, moving on in the last part of the book to spatiotemporal data. Some chapters also consider the treatment of functional data on lattices.
When we wrote our original book on the subject in the 1990s, James Ramsay and I hoped that we would encourage FDA as a way of thinking, not simply a collection of techniques. It has therefore been very pleasing to see the development of the field since then, and the abundance of research activity in the area has confirmed our hopes. I would urge readers and researchers to raise their sights above any specific methods, obviously important that they are, to ask how considering data as functions changes and broadens our statistical horizons. Particularly in the new era of data science, this concerns both what data can be collected and how they can be analyzed. I am sure this book will make a valuable contribution in helping them to do so.
November 2020
Sir Bernard Silverman
University of Oxford
University of Nottingham
Oxford and Nottingham
Jorge Mateu1 and Ramón Giraldo2
1Department of Mathematics, University Jaume I of Castellon, Spain
2Department of Statistics, National University of Colombia, Bogota, Colombia
Spatial statistics has developed rapidly during the last 30 years. We have seen an interesting progress both in theoretical developments and in practical studies. Some early applications were in mining, forestry, and hydrology. It seems to be honest to remark that the increasing availability of computer power and skillful computer software has stimulated the ability to solve increasingly complex problems. Clearly, these problems have some common elements: they were all of a spatial nature. Some theory was available, for example the random function theory as developed by Yaglom and others in the 1960s. But that was largely insufficient to find generic solutions for the whole class of problems, and hence, the applications required a new theory. Thereupon some far-reaching theories have been developed: image reconstruction, Markov random fields, point process statistics, geostatistics, and random sets, to mention just a few. As a next stage, these theories were applied successfully to new disciplinary problems leading to modifications and extensions of mathematical and statistical procedures. We therefore notice a general scientific process that has occurred in the field of spatial statistics: well-defined problems with a common character were suddenly on the agenda, and data availability and intensive discussion with practical and disciplinary researchers resulted in new theoretical developments. Often, it is difficult to say which was first, and what followed, but we see different theoretical models developed for different applications.
Spatial statistics has hence emerged as an important new field of science. One of the peculiarities is its power for visualization. A common cold-water fear of many statisticians and mathematicians to analyze images, to communicate their results by maps, and to have to trust information in pictures was overcome. It has led to interesting theories and better and more objective procedures for dealing with spatial variation. Following Wittgenstein, we could state that we needed some geniuses to tackle the obvious. Now, many results of a spatial statistical analysis could be communicated smoothly toward the nonstatistical audience, like a disciplinary scientist, a policy-maker, or an interested student. They, in turn, were able to judge whether a problem was solved, whether a policy measure was relevant or was inspired by the beautiful pictures expressing deep thoughts on relevant issues.
The role in policy-making may be once more stressed. It is known that many policy-makers are inclined to make a decision on the basis of a well developed, well organized, and well understandable figure. They find it (rightly so!) rather boring to use long lists of statistical data. But as political decisions affect us all, it puts another responsibility on the back of statisticians: to make statistically sound maps. It is often hard to say what that should be, but at the very least, we should be able to generate pictures, maps, and graphs that rely on good data and that show important aspects for decision-making.
In this way, spatial statistics has become a refreshing wind in statistics. We do not need to do well much longer on difficult equations, long lists of data, and tables with simulated controlled scenarios. But, to be clear on the back of all these nice pictures a sound science with sometimes difficult and tedious derivations and deep thoughts are still required to make serious progress.
Spatial statistics recognizes and exploits the spatial locations of data when designing for, collecting, managing, analyzing, and displaying such data. Spatial data are typically dependent, for which there are classes of spatial models available that allow process prediction and parameter estimation. Spatially arranged measurements and spatial patterns occur in a surprisingly wide variety of scientific disciplines. The origins of human life link studies of the evolution of galaxies, the structure of biological cells, and settlement patterns in archaeology. Ecologists study the interactions among plants and animals. Foresters and agriculturalists need to investigate plant competition and account for soil variations in their experiments. The estimation of rainfall and of ore and petroleum reserves is of prime economic importance. Rocks, metals, and tissue and blood cells are all studied at a microscopic level. Geology, soil science, image processing, epidemiology, crop science, ecology, forestry, astronomy, atmospheric science, or simply any discipline that works with data collected from different spatial locations, need to develop models that indicate when there is dependence between measurements at different locations. Spatiotemporal variability is a relatively new area within Spatial Statistics, which explains the scarcity of space-time statistical tools 20 years ago. There has been a growing realization in the last decade that knowing where data were observed could help enormously in answering the substantive questions that precipitated their collection. One of the most powerful tools for spatial data analysis is the map. For example, in military applications, the battlespace is mapped for command and control. The sensors are both in situ and remote, and they generate spatially distributed data of many different kinds. Producing a statistically optimal map, together with measures of map uncertainty, which is always up to date, is a complicated task. Once these types of statistical problems are solved, a geographic information system, or GIS, is well suited to forming the decision-making maps.
Spatial statistics can be considered a natural generalization of signal processing to higher dimensions. In traditional signal processing, one has a signal dependent on a scalar variable , which may belong to a discrete set or which may be continuous. Spatial statistics is concerned with cases in which is a multidimensional index of dimension . In most practical examples , though much of the basic theory and methodology is the same whatever the dimension. Although the models and methods of spatial statistics have not developed as rapidly as those for one-dimensional signal processing, there have nevertheless been substantial new developments in recent years. Standard and modern references on spatial statistics include the books of [1–4] among others.
Following Cressie [5], spatial data can be thought of as resulting from observations on the stochastic process , where is possibly a random set in . If we believe that the roots of statistical science are in data, we can classify spatial areas according to the type of observations encountered. Thus, (i) if is a fixed subset of and is a random vector at location , we are dealing with geostatistical data; (ii) if is a fixed (regular or irregular) collection of countably many points of and is a random vector at location , we are dealing with lattice data; (iii) if is a point process in and is a random vector at location , we are dealing with point patterns; (iv) if is a point process in and is itself a random set, we are dealing with spatial objects. Geostatistical-type problems are distinguished most clearly from lattice-and point-pattern-type problems by the ability of the spatial index to vary continuously over a subset of . A space-time process can be denoted by , where each of , , and is possibly random.
Spatial statistics is one of the major methodologies of environmental statistics. Its applications include producing spatially smoothed or interpolated representations of air pollution fields, calculating regional average means or regional average trends based on data at a finite number of monitoring stations, and performing regression analyses with spatially correlated errors to assess the agreement between observed data and the predictions of some numerical model. The notion of proximity in space is implicitly or explicitly present in the environmental sciences. Proximity is a relative notion, relative to the spatial scale of the scientific investigation. When a spatial dimension is present in an environmental study, the statistician's job is to create a statistical framework within which one carries out defensible inferences on processes and parameters of interest. These modeling and inference strategies are not always easy to do, but are never impossible. If statistics is to continue to be the broker of variability, it must address difficult questions such as those found in the environmental sciences, otherwise, it will become marginalized as a discipline. Problems in the environmental sciences are inherently spatial (and temporal), observational in nature, and have experimental units that are highly variable.
In the last decade, spatial statistics has undergone enormous development in the area of statistical modeling. It started slowly, building from models that were purely descriptive of spatial dependence. Then, it became apparent that the process of interest was usually hidden by measurement error and that the principal goal should be inference on the hidden process from the noisy data. It has only been in the last few years that the full potential for hierarchical spatial statistical modeling has been glimpsed. There is an enormous amount of flexibility in hierarchical statistical models, such as the opportunity to account for nonlinearities. Their attractive feature is that at each level of the hierarchy, the model specification is simple, yet globally, the model can be quite complex. This approach could be summarized as a model locally, analyze globally.
Applications of spatial statistics cover many areas. Much of the original impetus for the subject was driven by geostatistics. It was in this context that the technique of kriging, optimal least squares interpolation over a random spatial field, was originally developed. In recent years, the applications of spatial statistics have increased enormously, with particularly fruitful applications in the environmental and ecological sciences. A typical problem is the sampling of a pollution field, such as ozone in the atmosphere or toxic chemicals in rivers and lakes. Another example is the use of meteorological measurements in studies of global climate change. In these fields, as in geostatistics, the objective may be to interpolate spatially between measurements, but there are also other objectives which may be quite different. Spatial statistics has also found applications in such diverse fields as sociology, for example social networks theory and financial economics.
The usual approach in geostatistics is based on an assumption that the spatial random field is stationary and isotropic. In the original geophysical applications which motivated the development of the field, this assumption was often justified by the fact that with sparse data, there was no reasonable alternative. A further point is that many geostatistical applications involved only one measurement at each site (or equivalently, only one replication of the random field) so there was no way of determining the complete spatial covariance function without some kind of stationarity assumption. In modern environmental applications, however, there are very often enough monitoring stations to go beyond such assumptions, and with multiple observations per site, it is also possible to estimate the covariance between any pair of sites without assuming stationarity across the field. Another consideration is that very often, simple topography makes a stationary assumption implausible. Therefore, there are by now many reasons to go beyond a stationary model. In spite of this obvious need for nonstationary models; however, there is not, as yet, a wide variety of approaches to the problem.
Environmental issues have brought atmospheric science to the center of science and technology, where it now plays a key role in shaping national and international policy. Weather prediction plays a significant role in the planning of human affairs. Further, a broader appreciation of the role of weather and climate impacts on the environment of the planet has now led to nearly universal concern regarding potential climate change, its causes, impacts, and possible remedying. A large variety of statistical methods are used routinely in the atmospheric sciences. For example, techniques of multivariate time series are especially common. These include multivariate autoregressive, moving average models and Kalman filtering. Statistical methods for spatial data are also standard. A major tool in the analysis of space-time data is empirical orthogonal functions (EOF). Virtually, all atmospheric and oceanographic processes (e.g. wind, temperature, sea surface temperature, moisture) involve variability over space and time. One only needs examine the governing partial differential equations for wind processes, or their selected spatial-temporal averages, to see that mathematical and statistical descriptions of these dynamical processes depend on complicated temporal and spatial relationships. Furthermore, observations of geophysical processes typically include measurement errors and are often temporally and spatially incomplete, which may obscure the signal of interest.
In studies involving spatial data, it is seldom the case that data for only a single process are collected. Typically, there is a great expense associated with establishing spatial monitoring networks or other mechanisms of spatial data collection (e.g. satellites) and so measurements are usually made on two or more variables. Thus, statistical techniques for multivariate spatial data are critical for effective modeling of spatial processes.
Lately, there has been a rich and growing literature on space–time modeling. Fundamentally, it is clear that in the absence of a temporal component, second-order geostatistical models can be used to represent spatial variability. These are descriptive in the sense that, although they model spatial correlation, there is no causative interpretation associated with them. Thus, for space-time modeling, the geostatistical paradigm assumes a descriptive structure for both space and time (i.e. covariance structures are directly specified). For example, one can extend the geostatistical kriging methodology for spatial processes by assuming that time is just another spatial dimension. Alternatively, one can treat time slices of a spatial field as variables and apply a multivariate or cokriging approach. Although these approaches have been successful in many applications, there are fundamental differences between space and time, and it is not likely that realistic covariance structures can be specified that accurately capture the complicated dynamical processes as found in geophysical applications.
In the absence of a spatial component, there is a large class of time series models that could be used to represent the temporal variability. These are dynamic in the sense that they exploit the fact that time flows in only one direction, and so the state of the process at the current time is related to what happened at previous times. Thus, one might consider the space–time process as a collection of spatially correlated time series in continuous space, or on a spatial lattice. Although these approaches include dynamical structures, without a descriptive spatial component one lacks the ability to perform spatial prediction at locations without observations. If both temporal and spatial components are present, it is natural to combine the temporally dynamic state-space approach and the spatially descriptive approach. These models are referred to as space–time dynamic models.
Spatial interpolation is an essential feature of many GIS. It is a procedure for estimating values of a variable at unsampled locations. A map with isolines is usually the visual output of such a process and plays a crucial role in decision-making. Based on Tobler's law of geography, which stipulates that observations close together in space are more likely to be similar than those farther apart, the development of models attempting to represent the way close observations are related can sometimes be very problematic. The approaches can be divergent and may therefore lead to very different results. As a consequence, an understanding of the initial assumptions and methods used is the key to the spatial interpolation process.
Surprisingly, when spatial interpolation tools are integrated within GIS, they are often implemented in such a way that users have no real choice in selecting the best possible methods, and if they do have a choice, required input parameters are sometimes fixed, without any possible way of modifying them. One reason for the frequent blind use of spatial interpolation methods, and spatial statistics in general, probably has its origins in teaching. Despite the large variety of its applications, the discipline has been confined to those fields where it has seen its major developments. The progress made in spatial statistics is therefore usually presented only in journals dedicated to statistics, mining, and petroleum engineering. As a consequence, GIS users who have a different technical background often do not have an in-depth knowledge of such spatial interpolation techniques. Furthermore, since the conventional tests used in basic statistics usually generate some kind of categorical answer, the prerequisite experience and statistical knowledge necessary for the proper use of spatial interpolation techniques are often discouraging to this type of users. Nevertheless, during the last few years, the diversity of the applications of these methods has encouraged the publication of new books and new case studies and has stimulated a number of conferences on the subject.
This section has been partially taken and summarized in parts from [6], intending to provide a brief overview to spatial geostatistics. The reader is referred to [6] for further and more complete details.
Geostatistics can be defined as the study of regionalized phenomena, that is, phenomena that stretch across space and which have a certain spatial organization or structure. However, geostatistics is not applied to the regionalized phenomenon as such, which is a physical reality, but to a mathematical description of that reality, that is, a numerical function called regionalized variable or regionalization, defined in a geographical space, which is supposed to correctly represent and measure that phenomenon.
In order to delve deeper into the concept of regionalized variable, let us imagine we are interested in a feature of a given phenomenon that spans across space and that several measurements are taken in a domain at a given moment in time. If the measurements are taken on objects or similar, the objects sampled can be considered a subset of a larger collection of objects, as many more measurements could have been taken, but were not for many possible reasons. If the observations were made at certain points in the domain, infinite measurements could be taken.
When spans across the domain under study, , the set , is called a regionalized variable or regionalization, the set being a collection of values of the regionalized variable, and each value of that collection being a regionalized value.
It is true that a deterministic approach can be employed to describe or model a regionalized phenomenon and obtain an accurate assessment of the values of the regionalization on the basis of a limited number of observations. However, this requires in-depth knowledge of the origin of the phenomenon and the physical or mathematical laws that govern the evolution of the regionalized variable. Furthermore, many of the regionalized phenomena that are usually studied are so complex that a deterministic approach can only partially portray them. That is why the deterministic approach is discarded and the probabilistic approach, which permits modeling both the knowledge of and also the uncertainty surrounding the regionalized random phenomenon, is adopted.
From a probabilistic perspective, the regionalized value can be seen as the result of a random mechanism, resulting in a random variable (r.v.). If the regionalized values at all the points in the domain are considered, it can be seen as a reality of an infinitely large set of r.v.s, one at each point in the domain, which is known as spatial random function (synonyms: stochastic process, random field).
When spans across the domain under study, , we have a family of r.v.s, , which constitutes a spatial random field (r.f.).
This methodological decision is one of the cornerstones of geostatistics: the regionalized variable is interpreted as a realization of a spatial r.f. At this point, we must state that the regionalized variable is often highly locally irregular (which makes it impossible to represent using a deterministic mathematical function) and has a certain spatial organization or structure. The probabilistic approach, or probabilistic geostatistics, which interprets the regionalized variable as a realization of a r.f., can take into account all the aspects of regionalization mentioned above, because, as stated in page 55 of [7]:
At each location
,
is a r.v. (hence, the erratic aspect).
For any given set of points
, the r.v.s
are linked by a network of spatial correlations responsible for the similarity of the values they take (hence the structured aspect).
Let be a r.f. and let us consider the set of points . Then, the r.f. is characterized by its -dimensional distribution function. The set of -dimensional distribution functions for all values of and all possible choices of in the domain is called the spatial law of probability.
For a given r.f., , the -dimensional distribution function is defined as
In linear geostatistics, it is enough to know the first two moments of the distribution of . What is more, in most practical applications, the available information does not allow to infer higher-order moments.
The expectation, expected value or first-order moment of a r.f. is defined as a nonrandom function of that coincides at each point with the expectation of the r.v. at that point , where , . It is also called the drift of the r.f., especially when it varies with location.
The variance of a r.f. is defined as a nonrandom function of that coincides at each point with the variance of the r.v. at that point, i.e. , where , .
The covariance function of a r.f. is defined as a nonrandom function of and , such that for any pair of values , coincides with the covariance between the r.v. at those two points
The variogram of the r.f. is defined as the variance of the first differences of the r.f.
The function is called semivariogram.
is a Gaussian r.f. if for all and any given set of points , the joint distribution of is a multivariate Gaussian distribution. A multivariate Gaussian distribution is characterized by a mean vector and a variance–covariance matrix, such that the two first moments of a Gaussian r.f. completely determine its probability structure. The Gaussianity of the r.f. is a common assumption in geostatistics.
regionalized variable in probabilistic terms as a particular realization of a given r.f. makes operational sense when it is possible to infer part or all of the law of probability which defines that r.f. In this sense, stationarity, which indicates a certain degree of homogeneity in the regionalization across space, is a desirable quality.
Indeed, it would be impossible to infer the probability law of a r.f. if there was only one realization of the r.f. In order to make inferences consistently, many realizations are necessary. However, in reality there is only one. The solution to this problem is to adopt the hypothesis of stationarity or spatial homogeneity. The idea behind the hypothesis of stationarity is to substitute repetitions of the (inaccessible) realizations of the r.f. with repetitions in space, that is, the values observed at different locations in the domain under study have the same characteristics and can be considered as realizations of the same r.f. in mathematical terms. However, these realizations are not independent, and an additional hypothesis, ergodicity, is normally assumed; see pages 19–22 of [8] for details. The hypothesis of stationarity means that the spatial law of probability of the r.f. or part of it, is translation invariant. That is, the probabilistic properties of a set of observations do not depend on the specific locations where they have been measured, but only on their separations.
Therefore, in mathematical and probabilistic terms, the hypothesis of stationarity refers to the regular behavior in space of the moments of the r.f., or the function itself and, as we will see later, there are different degrees of stationarity. This hypothesis will allow us to act as if all the variables that make up the r.f. had the same probability distribution (or the same moments; we can even relax this assumption) and, as a consequence, to be able to make inferences.
Using the assumed level of spatial homogeneity of the r.f. that (supposedly) generates the observed realization as a basis, we have the following cases: Stationary random function in the strict sense, second-order stationary random function, and intrinsically stationary random function or random function of stationary increments. Let us briefly introduce these concepts.
The r.f. is said to be stationary in the strict sense, or strictly stationary, if the families of r.v.s and have the same joint distribution function for all , and for any given spatial points and any translation vector .
In other words, the joint distribution function of is unaffected by the translation of an arbitrary quantity . As a result, density functions with dimension lower than k do not depend on location either. Generally speaking, this is a strongly strict condition, which is why this hypothesis is normally relaxed to the so-called “assumption of second-order stationarity,” which limits the stationarity hypothesis to the first two moments of the r.f. (recall that in linear geostatistics, we are only interested in the two first moments of the r.f.).
The r.f. is said to be second-order stationary, weakly stationary or stationary in the broad sense, if it has finite second-order moments (that is the covariance exists) and verifies that
The expectation exists and is constant, and therefore does not depend on the location
The covariance exists for every pair of r.v.s,
and
, and only depends on the vector
that joins the locations
and
As the covariance function of a second-order Stationary, r.f. is only a function of , the variance of the r.f. exists and is finite and constant:
In light of Eqs. (1.4) and (1.6), the second-order stationarity hypotheses can be interpreted as if the regionalized variable takes values that fluctuate around a constant value (the mean), and the variation of these fluctuations is the same everywhere in the domain.
In some cases, in order to model the spatial dependence of second-order stationary r.f.s, the correlogram, or correlation function, is used instead of the covariogram, and is defined as
