181,99 €
Applying machine learning and optimization technologies to water management problems The rapid development of machine learning brings new possibilities for hydroinformatics research and practice with its ability to handle big data sets, identify patterns and anomalies in data, and provide more accurate forecasts. Advanced Hydroinformatics: Machine Learning and Optimization for Water Resources presents both original research and practical examples that demonstrate how machine learning can advance data analytics, accuracy of modeling and forecasting, and knowledge discovery for better water management. Volume Highlights Include: * Overview of the application of artificial intelligence and machine learning techniques in hydroinformatics * Advances in modeling hydrological systems * Different data analysis methods and models for forecasting water resources * New areas of knowledge discovery and optimization based on using machine learning techniques * Case studies from North America, South America, the Caribbean, Europe, and Asia The American Geophysical Union promotes discovery in Earth and space science for the benefit of humanity. Its publications disseminate scientific knowledge and provide resources for researchers, students, and professionals.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 659
Veröffentlichungsjahr: 2023
COVER
TABLE OF CONTENTS
TITLE PAGE
COPYRIGHT
LIST OF CONTRIBUTORS
PREFACE
1 HYDROINFORMATICS AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING IN WATER‐RELATED PROBLEMS
1.1. Introduction
1.2. Key Principles of ML/Hydroinformatics
1.3. Model Building and Input Variable Selection in Machine Learning for Water‐related Problems
1.4. Advanced Techniques in Machine Learning for Water Resources Applications
1.5. Future Directions and Challenges
References
PART I: MODELING HYDROLOGICAL SYSTEMS
2 IMPROVING MODEL IDENTIFIABILITY BY DRIVING CALIBRATION WITH STOCHASTIC INPUTS
2.1. Introduction
2.2. Methodology
2.3. Insights to the Stochastic Simulation Procedure
2.4. Proof of Concept
2.5. Concluding Remarks
Acknowledgments
References
3 A TWO‐STAGE SURROGATE‐BASED PARAMETER CALIBRATION FRAMEWORK FOR A COMPLEX DISTRIBUTED HYDROLOGICAL MODEL
3.1. Introduction
3.2. Method
3.3. Study Area and Data
3.4. Experimental Setup
3.5. Results
3.6. Discussion
3.7. Conclusions
Acknowledgments
References
4 FUZZY COMMITTEES OF CONCEPTUAL DISTRIBUTED MODEL
4.1. Introduction
4.2. Case Study
4.3. Methodology
4.4. Results and Discussion
Acknowledgments
References
5 REGRESSION‐BASED MACHINE LEARNING APPROACHES FOR DAILY STREAMFLOW MODELING
5.1. Introduction
5.2. Materials and Methods
5.3. Results
5.4. Conclusion and Future Works
Acknowledgments
References
6 USE OF NEAR‐REAL‐TIME SATELLITE PRECIPITATION DATA AND MACHINE LEARNING TO IMPROVE EXTREME RUNOFF MODELING
6.1. Introduction
6.2. Study Area and Data Set
6.3. Methodology
6.4. Results
6.5. Discussion
6.6. Conclusions
References
PART II: FORECASTING WATER RESOURCES
7 FORECASTING WATER LEVELS USING MACHINE (DEEP) LEARNING TO COMPLEMENT NUMERICAL MODELING IN THE SOUTHERN EVERGLADES, USA
7.1. Introduction
7.2. Methods
7.3. Workflow
7.4. Experimental Setup
7.5. Comparison of Measured Rainfall and NWP Rainfall Forecasts
7.6. Results and Discussion
7.7. Conclusions and Future Directions
Disclaimer
Appendix
References
8 APPLICATION OF A MULTILAYER PERCEPTRON ARTIFICIAL NEURAL NETWORK (MLP‐ANN) IN HYDROLOGICAL FORECASTING IN EL SALVADOR
8.1. Introduction
8.2. Data‐Driven Modeling Technique
8.3. Characterization of The Grande De San Miguel Catchment
8.4. Methodology
8.5. Experimental Setups
8.6. Results and Discussion
8.7. Operational MLP‐ANN Forecast Model
8.8. Conclusions and Recommendations
Acknowledgments
References
9 NOISE FILTER WITH WAVELET ANALYSIS IN ARTIFICIAL NEURAL NETWORKS (NOWANN) FOR FLOW TIME SERIES PREDICTION
9.1. Introduction
9.2. Materials
9.3. Methodology
9.4. Case Study
9.5. Results and Discussions
9.6. Conclusions
References
PART III: KNOWLEDGE DISCOVERY AND OPTIMIZATION
10 APPLICATION OF NATURAL LANGUAGE PROCESSING TO IDENTIFY EXTREME HYDROMETEOROLOGICAL EVENTS IN DIGITAL NEWS MEDIA: CASE OF THE MAGDALENA RIVER BASIN, COLOMBIA
10.1. Introduction
10.2. Case Study
10.3. Methodology
10.4. Results
10.5. Conclusions
References
11 THREE‐DIMENSIONAL CLUSTERING IN THE CHARACTERIZATION OF SPATIOTEMPORAL DROUGHT DYNAMICS: CLUSTER SIZE FILTER AND DROUGHT INDICATOR THRESHOLD OPTIMIZATION
11.1. Introduction
11.2. Methods and Data
11.3. Results and Discussion
11.4. Summary and Conclusions
Acknowledgments
References
12 DEEP LEARNING OF EXTREME RAINFALL PATTERNS USING ENHANCED SPATIAL RANDOM SAMPLING WITH PATTERN RECOGNITION
12.1. Introduction
12.2. Toolbox
12.3. Identify the Specific Patterns of Rainfall in Great Britain
12.4. Generating the Training Sets for Deep Learning Studies
12.5. Conclusion
References
13 TELECONNECTION PATTERNS OF RIVER WATER QUALITY DYNAMICS BASED ON COMPLEX NETWORK ANALYSIS
13.1. Introduction
13.2. Study Areas and Methods
13.3. Theory of Complex Network
13.4. Results and Discussion
13.5. Comparison of Teleconnection Features Between Scales and Parameters
13.6. Study Limitations and Future Works
13.7. Conclusion
Acknowledgments
References
14 PROBABILISTIC ANALYSIS OF FLOOD STORAGE AREAS MANAGEMENT IN THE HUAI RIVER BASIN, CHINA, WITH ROBUST OPTIMIZATION AND SIMILARITY‐BASED SELECTION FOR REAL‐TIME OPERATION
14.1. Introduction
14.2. Materials and Methods
14.3. Results and Discussion
14.4. Conclusions
References
15 MULTI‐OBJECTIVE OPTIMIZATION OF RESERVOIR OPERATION POLICIES USING MACHINE LEARNING MODELS: A CASE STUDY OF THE HATILLO RESERVOIR IN THE DOMINICAN REPUBLIC
15.1. Introduction
15.2. Case Study
15.3. Methodology
15.4. Results
15.5. Conclusions
Acknowledgments
References
INDEX
END USER LICENSE AGREEMENT
Chapter 1
Table 1.1 Example of the input matrix generation for a rainfall‐runoff model...
Chapter 2
Table 2.1 Key summary statistical characteristics of observed rainfall and r...
Chapter 3
Table 3.1 Basic information about the study area.
Table 3.2 Ranges of VIC calibration parameters.
Table 3.3 Benchmark values and optimal solutions of the surrogate model.
Table 3.4 Crucial ranges of NSE and Bias in each gauge station.
Table 3.5 Evaluations of simulation with the parameter sets given by VIC and...
Table 3.6 Evaluations of simulation with the parameter sets given by VIC and...
Table 3.7 Time consumed by multi‐objective optimization algorithms in the LJ...
Table 3.8 Time consumed by multi‐objective optimization algorithms in the XJ...
Chapter 4
Table 4.1 Weighting schemes for root mean square error for high (RMSE
hf
) and...
Table 4.2 Performance of committee models with a different combination of we...
Table 4.3 Committee of conceptual distributed models' performance.
Chapter 5
Table 5.1 Performance metrics of the ML and SAC‐SMA models.
Chapter 6
Table 6.1 Search space (grid) of the RF runoff models.
Table 6.2 RF hyperparameterization of extreme runoff models.
Table 6.3 Number of events and efficiencies on test subsets of runoff models...
Chapter 7
Table 7.1 Root mean squared error of example model.
Table 7.2 Details for short‐term and medium‐range forecast horizons.
Table 7.3 Measured and forecasted rainfall and water level.
Table 7.A1 Performance indicators for each station specific TDNN model inves...
Table 7.A2 Performance indicators for each station specific LSTM model inves...
Chapter 8
Table 8.1 Potential evapotranspiration‐elevation relationships.
Table 8.2 MLP‐ANNs experimental summary for flow forecasting in Grande de Sa...
Table 8.3 Statistical properties for training, cross‐validation, and verific...
Table 8.4 Statistical properties for training, cross‐validation, and verific...
Table 8.5 Summary of goodness‐of‐fit function for MLP‐ANN and Naïve forecast...
Table 8.6 Number of hidden nodes or neurons in the MLP‐ANN forecast models....
Chapter 9
Table 9.1 Bibliographic reference summary used for this work.
Table 9.2 Description of the case study analyzed.
Table 9.3 Best fold among all HNs: ANN stand‐alone model; best fold among al...
Table 9.4 WM5P model results; WANN model results.
Chapter 10
Table 10.1 Keywords defined for the news search.
Table 10.2 Selected newspaper sources grouped by region.
Table 10.3 Correlation of monthly sentiment index and monthly climatic data....
Table 10.4 Confusion matrix results from Ayapel Swamp, showing high and low ...
Table 10.5 Confusion matrix results from Betania Reservoir, showing high and...
Chapter 11
Table 11.1 Angle C (Degrees) for each Drought Indicator (DI) threshold and c...
Chapter 12
Table 12.1 Specific patterns of rainfall in three countries cataloged at thr...
Table 12.2 Temporal change of specific rainfall patterns in three countries ...
Chapter 14
Table 14.1 Historical most severe flood records at Lutaizi station and calcu...
Table 14.2 Robust strategies for different downstream risk levels.
Table 14.3 Similar hydrographs chosen by four criteria.
Chapter 15
Table 15.1 Performance metrics for test runs of the ML models.
Table 15.2 Summary of the parameters and configuration of the ML models.
Table 15.3 Parameters of the optimization process.
Table 15.4 Hypervolume indicator for optimized models.
Chapter 1
Figure 1.1 Evolution of AI topics: From artificial intelligence to machine l...
Figure 1.2 Differences between traditional programming and ML, as seen in co...
Figure 1.3 Encapsulation of natural systems and processes in ML models, with...
Figure 1.4 Interactions of digital information sources and citizens, and the...
Figure 1.5 A typical sequence of steps in Machine Learning Modeling Framewor...
Figure 1.6 Training a Machine Learning model as an iterative optimization pr...
Figure 1.7 Autocorrelation for the discharge at Ourthe River basin (tributar...
Figure 1.8 Possible cycles in ML model building.
Figure 1.9 A typical neuron and a single‐output multilayer perceptron (MLP A...
Figure 1.10 Training M5 model trees and their operation on a new unseen inst...
Figure 1.11 Recurrent neural network. Inputs from a time series (X) are fed ...
Chapter 2
Figure 2.1 The classical calibration paradigm, based on the split‐sample app...
Figure 2.2 Dependency patterns among runoff data (a) between subsequent mont...
Figure 2.3 Synthetic runoff data of 2,000 yr length at Evinos basin, aggrega...
Figure 2.4 Schematic view of the conceptual model and the associated fluxes ...
Figure 2.5 Comparison of model fitting (i.e., simulated vs. observed runoff ...
Figure 2.6 Comparison of model fitting (i.e., simulated vs. observed runoff ...
Figure 2.7 Empirical histograms of model parameters obtained with the GLUE m...
Chapter 3
Figure 3.1 Framework of the multi‐objective surrogate‐based parameter calibr...
Figure 3.2 Flowchart of the initial surrogate model training.
Figure 3.3 Location of (a) the Lanjiang River basin and (b) the Xiangjiang R...
Figure 3.4 Multi‐objective optimization procedure of the benchmark test in t...
Figure 3.5 Trace plots of the seven calibrated parameters. The All populatio...
Figure 3.6 Population values with the increasing generations of (a)
NSE
at L...
Figure 3.7 Population values with the increasing generations of (a)
NSE
at H...
Figure 3.8 Two‐dimensional Pareto plots for
Bias
versus
NSE
in the LJR basin...
Figure 3.9 Two‐dimensional Pareto plots for
Bias
versus
NSE
in the XJR basin...
Figure 3.10 Compromise parameters of the surrogate model and VIC in (a) the ...
Figure 3.11 Simulated daily streamflow series and observed streamflow (a) at...
Figure 3.12 Simulated daily streamflow series and observed streamflow (a) at...
Figure 3.13 Spatial maps of annual average runoff depth (mm) in the LJR basi...
Figure 3.14 Spatial maps of annual average runoff depth (mm) in the XJR basi...
Chapter 4
Figure 4.1 The concept of building local models represents different flow re...
Figure 4.2 (a) Jiboa catchment stations and (b) model setup.
Figure 4.3 Proposed steps and approaches for improving committee models usin...
Figure 4.4 Model structure.
Figure 4.5 Weighting schemes for weighted root mean squared error as an obje...
Figure 4.6 Fuzzy membership functions to combine specialized models: (a) MF‐...
Figure 4.7 Experimental models setup.
Figure 4.8 Weighting scheme parameters (gamma and delta). The first part of ...
Figure 4.9 The relation between committee component and committee model perf...
Figure 4.10 Normalized parameters plot (parameters corresponding to Pareto o...
Figure 4.11 Performance graph of all committee models. Models are ordered fr...
Figure 4.12 Flow hydrograph result from committee model Lumped‐MF‐A.
Figure 4.13 Performance of committee models at the highest flood event in th...
Figure 4.14 Normalized performance criteria of all models for the visual com...
Figure 4.15 Pareto optimal sets of specialized models result from the multio...
Chapter 5
Figure 5.1 A conceptual framework of the proposed modeling workflow illustra...
Figure 5.2 Observed versus simulated daily streamflow of the ML methods as w...
Figure 5.3 FDCs that display modeled and observed streamflow values against ...
Chapter 6
Figure 6.1 The Jubones basin in the tropical Andes of Ecuador, South America...
Figure 6.2 Mean annual precipitation in mm (for 2019 and 2020) measured by t...
Figure 6.3 (a) Runoff and precipitation (PERSIANN‐CCS) time series at the ou...
Figure 6.4 Precipitation identification with an object‐based Connected Compo...
Figure 6.5 Illustration of the precipitation‐retrieval modular approach usin...
Figure 6.6 Meteorological precipitation information retrieved from 47 extrem...
Figure 6.7 Localization of precipitation object centroids (dots) associated ...
Figure 6.8 Precipitation classes associated with extreme hydrological events...
Figure 6.9 Scatter plot between extreme runoff observations and simulations ...
Chapter 7
Figure 7.1 (a) Historical flow through the Everglades before drainage and (b...
Figure 7.2 Map showing the study area in the southern Everglades, Florida, a...
Figure 7.3 Correlation analyses for the three stations
not
influenced by a c...
Figure 7.4 Correlation analyses for the stations influenced by a control str...
Figure 7.5 Averaged rainfall (mm) for the dry season and wet season in the s...
Figure 7.6 Charts comparing the Root Mean Squared Error of the water‐level (...
Figure 7.7 Short‐term forecasts for station NP201. (a) Rainfall used in the ...
Figure 7.8 Short‐term forecasts for station NP201. (a) Three‐year water‐leve...
Figure 7.9 Medium‐range forecasts for station NP201. (a) Rainfall used in th...
Figure 7.10 Medium‐range forecasts for station NP201. (a) Three‐year water‐l...
Figure 7.11 A 24 hr water‐level forecast comparison of ANN and BISECT model ...
Chapter 8
Figure 8.1 (a) Multilayer Perceptron Artificial Neural Network architecture ...
Figure 8.2 Rainfall and water level stations in Grande de San Miguel catchme...
Figure 8.3 Overall methodology to build an MLP‐ANN model.
Figure 8.4 Response time based on the physical characteristic of the catchme...
Figure 8.5 Correlation coefficient (a) and AMI (b) analysis between predicte...
Figure 8.6 (a) Scatter plot and (b) hydrograph comparison between observed a...
Figure 8.7 Relation between the model input and the resulting weights for ea...
Figure 8.8 Comparison between the observed discharge, the naïve model, and M...
Figure 8.9 Scatter plot between the observed and simulated MLP‐ANN discharge...
Figure 8.10 Model performance comparison between the naïve and MLP‐ANN forec...
Figure 8.11 Conceptual design of the operational MLP‐ANN forecasting in Gran...
Chapter 9
Figure 9.1 Boundary limits and traditional extension methods. (a) Wavelet lo...
Figure 9.2 Traditional ANN approach.
Figure 9.3 WANN approach. In order to create this model, we use the multires...
Figure 9.4 Soft and hard threshold for de‐noising the signal source (http://...
Figure 9.5 De‐noising process in the details of the signal.
Figure 9.6 Pseudocode for the ANN exhaustive.
Figure 9.7 All 10‐fold's RMSE results for HN = 20.
Figure 9.8 Best fold's RMSE results per HN.
Figure 9.9 Pseudocode for the WANN exhaustive.
Figure 9.10 Exhaustive search RMSE‐train.
Figure 9.11 Exhaustive search RMSE‐test.
Figure 9.12 Exhaustive search RMSE‐val.
Figure 9.13 Testing data set: Simulated (upper figure), residual errors (low...
Figure 9.14 Data‐driven model with raw time series: Training data set, valid...
Figure 9.15 Data‐driven models with wavelets as preprocessor: Training data ...
Figure 9.16 Testing data set: Simulated discharge (upper) residual errors(lo...
Chapter 10
Figure 10.1 Magdalena River Basin scheme: Rivers and water bodies. WGS‐84. d...
Figure 10.2 Journal article shows the emergency (from Tiempo, 2018).
Figure 10.3 Methodology process.
Figure 10.4 Data‐mining process.
Figure 10.5 News‐filtering process.
Figure 10.6 Graphical representation of the CBOW model and the Skip‐gram mod...
Figure 10.7 Distribution of 207 water bodies.
Figure 10.8 Web‐scraping simplified process.
Figure 10.9 Example of information extracted from a newspaper website (https...
Figure 10.10 Distribution of journal articles (percentage).
Figure 10.11 Word cloud image for Quebrada el Carmen representing the news a...
Figure 10.12 Number of newspaper articles about the rivers composing the Mag...
Figure 10.13 Sentiment change over time: Hidroituango.
Figure 10.14 Monthly river‐level range, Puerto Valdivia station in the Cauca...
Figure 10.15 Monthly river‐level range, La Campina station (Continuous line)...
Figure 10.16 Spatial sentiment changes regarding the Magdalena River Basin, ...
Figure 10.17 Spatial comparison sentiment and climatological variables regar...
Chapter 11
Figure 11.1 Schematic overview of the methodology for 3‐D drought clusters c...
Figure 11.2 Scheme of the method to calculate the optimal cluster size filte...
Figure 11.3 Number of 3‐D clusters (nc) calculated for different drought ind...
Figure 11.4 As Figure 11.3 but for durations up to 20 months and sizes up to...
Figure 11.5 Number of 3‐D clusters (nc) calculated for different drought ind...
Figure 11.6 As Figure 11.5 but for durations up to 20 months and sizes up to...
Figure 11.7 Percentage of drought area calculated for each 3‐D cluster. Resu...
Figure 11.8 Identification of the optimal cleaning filter size. The result f...
Figure 11.9 (a) Duration (months) and magnitude (number of voxels) of the dr...
Figure 11.10 (a) Centroids of the 3‐D clusters for the period 1950–2017. Cen...
Figure 11.11 (a) Drought from May 1957 to September 1972. (b) Percentage of ...
Figure 11.12 (a) Drought from December 2006 to September 2017. (b) Percentag...
Figure 11.13 (a) Drought from October 1984 to September 1994. (b) Percentage...
Figure 11.14 Reported droughts in the EMDAT and CAZALAC (left). Percentage o...
Chapter 12
Figure 12.1 The new cluster analysis and pattern segmentation component desi...
Figure 12.2 The histograms of the number of daily rainfall pattern in Englan...
Figure 12.3 Attribute for labels (a) L1, (b) L2, and (c) L3 used in CNN.
Chapter 13
Figure 13.1 Study area and monitoring campaign in China and Huaihe River bas...
Figure 13.2 Different networks: (a) An undirected network with identical nod...
Figure 13.3 Topological structure of water quality monitoring networks in th...
Figure 13.4 Degree centrality pattern of water quality monitoring networks i...
Figure 13.5 Clustering coefficient pattern of water quality monitoring netwo...
Figure 13.6 Topological structure of water quality monitoring networks in Ch...
Figure 13.7 Degree centrality pattern of water quality monitoring networks i...
Figure 13.8 Clustering coefficient pattern of water quality monitoring netwo...
Chapter 14
Figure 14.1 (a) Huai River basin (adapted from Wu et al., 2015) and (b) sche...
Figure 14.2 Inflow hydrograph at Lutaizi station from 24 June to 7 September...
Figure 14.3 Schematic diagram of calculation for downstream risk.
Figure 14.4 Damage curves for four storage areas (Jonoski et al., 2019).
Figure 14.5 Example of Pareto front of the coupled simulated‐optimization mo...
Figure 14.6 Example result of optimal strategy operation (Jonoski et al., 20...
Figure 14.7 Hydrographs generated by Gamma‐like function of (a) one and (b) ...
Figure 14.8 One hundred sampled hydrographs using LHS.
Figure 14.9 Enlarged hydrographs using Homogeneous Multiple Enlargement.
Figure 14.10 PDFs of storage area damage under certain downstream risk.
Figure 14.11 Estimation of worst case and expected value of potential robust...
Figure 14.12 Test of robust strategies with enlarged hydrographs.
Figure 14.13 Comparison of tested and similar hydrographs.
Figure 14.14 Comparison of Pareto fronts of tested and similar hydrographs....
Figure 14.15 Test of real‐time strategies with three enlarged hydrographs.
Figure 14.16 Comparison of robust strategies and similarity selection method...
Chapter 15
Figure 15.1 Map of the Yuna River basin.
Figure 15.2 Floodplain in the lower Yuna basin.
Figure 15.3 Dam components: (a) location of the dam, (b) operation levels, (...
Figure 15.4 Real Data of the analysis period: (a) Inflows and outflows of th...
Figure 15.5 Flow diagram of the reservoir operation model built for the Hati...
Figure 15.6 Scheme of the proposed methodology.
Figure 15.7 Architecture of the ANN: (a) MLP, (b) RBN.
Figure 15.8 Correlation analyses for the selection of ML inputs: between inf...
Figure 15.9 Test run for 6‐7‐1 MLP configuration for simulation of real rese...
Figure 15.10 Configuration of the (a) MLP and (b) RBN proposed for the opera...
Figure 15.11 Demand for irrigation downstream of the reservoir.
Figure 15.12 Unified Pareto fronts: (a) All the operation models; (b) operat...
Figure 15.13 Envelope for hydrographs generated by all reservoir operations ...
Figure 15.14 Comparison of results of all the solutions to reservoir operati...
Figure 15.15 Parallel graphic: (a) Reservoir operations that improve the thr...
Figure 15.16 Pareto front for models with MLP and NSGA II optimizers.
Figure 15.17 Zoomed‐in Pareto front of the selected optimal reservoir operat...
Figure 15.18 Simulation of the selected reservoir operations for the analysi...
Figure 15.19 Hydrograph of the selected reservoir operations for the wet per...
Cover
Table of Contents
Title Page
Copyright
List of Contributors
Preface
Begin Reading
Index
End User License Agreement
iii
iv
vii
viii
ix
x
xi
xii
xiii
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
177
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
283
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
Special Publications 78
Gerald A. Corzo PerezDimitri P. SolomatineEditors
This Work is a co‐publication of the American Geophysical Union and John Wiley and Sons, Inc.
This edition first published 2024© 2024 American Geophysical Union
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.
Published under the aegis of the AGU Publications Committee
Matthew Giampoala, Vice President, PublicationsCarol Frost, Chair, Publications CommitteeFor details about the American Geophysical Union visit us at www.agu.org.
The right of Gerald A. Corzo Perez and Dimitri P. Solomatine to be identified as the editors of this work has been asserted in accordance with law.
Registered OfficeJohn Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA
Editorial Office111 River Street, Hoboken, NJ 07030, USA
For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.
Wiley also publishes its books in a variety of electronic formats and by print‐on‐demand. Some content that appears in standard print versions of this book may not be available in other formats.
Limit of Liability/Disclaimer of Warranty
While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
Library of Congress Cataloging‐in‐Publication Data
Names: Corzo Perez, Gerald Augusto, editor. | Solomatine, Dimitri P., editor. | John Wiley & Sons, publisher.Title: Advanced hydroinformatics : machine learning and optimization for water resources / Gerald A. Corzo Perez, Dimitri P. Solomatine.Description: Hoboken, NJ : Wiley, 2024. | Includes index.Identifiers: LCCN 2023039032 (print) | LCCN 2023039033 (ebook) | ISBN9781119639312 (hardback) | ISBN 9781119639329 (adobe pdf) | ISBN 9781119639343 (epub)Subjects: LCSH: Hydrology–Data processing. | Hydrologic models.Classification: LCC GB656.2.H9 A36 2024 (print) | LCC GB656.2.H9 (ebook) | DDC 551.480285/631–dc23/eng/20231023LC record available at https://lccn.loc.gov/2023039032LC ebook record available at https://lccn.loc.gov/2023039033
Cover design: WileyCover image: © Alexander Nikitin/Getty Images; fotograzia/Getty Images
Nicholas G. Aumen
Southeast Region
United States Geological Survey
Boynton Beach, Florida, USA
Biswa Bhattacharya
IHE Delft Institute for Water Education
Delft, The Netherlands
Rolando Célleri
Department of Water Resources and Environmental Sciences, and Faculty of Engineering
University of Cuenca
Cuenca, Ecuador
Gerald A. Corzo Perez
IHE Delft Institute for Water Education
Delft, The Netherlands
Vitali Diaz
IHE Delft Institute for Water Education
Delft, The Netherlands; and
Delft University of Technology
Delft, The Netherlands
Santiago Duarte
IHE Delft Institute for Water Education
Delft, The Netherlands; and
Delft University of Technology
Delft, The Netherlands
Andreas Efstratiadis
Department of Water Resources and Environmental Engineering
National Technical University of Athens
Zografou, Greece
Mostafa Farrag
Department of Water Quality and Ecology Software
Deltares, Delft, The Netherlands; and
Hydrology Section
GFZ German Research Centre for Geosciences
Potsdam, Germany; and
Institute for Environmental Sciences and Geography
University of Potsdam
Potsdam, Germany
Jan Feyen
Faculty of Bioscience Engineering
Catholic University of Leuven
Leuven, Belgium
Courtney S. I. Forde
Caribbean Institute for Meteorology and Hydrology
Saint Michael, Barbados
Haiting Gu
Institute of Hydrology and Water Resources
Zhejiang University
Hangzhou, China
Daniel R. Hitchcock
Department of Agricultural Sciences
Clemson University
Clemson, South Carolina, USA
Jiping Jiang
School of Environmental Science and Engineering
Southern University of Science and Technology
Shenzhen, China; and
State Key Laboratory of Urban Water Resource and Environment
School of Environment
Harbin Institute of Technology
Harbin, China
Andreja Jonoski
Department of Hydroinformatics and Socio‐Technical Innovation
IHE Delft Institute for Water Education
Delft, The Netherlands
Panagiotis Kossieris
Department of Water Resources and Environmental Engineering
National Technical University of Athens
Zografou, Greece
Li Liu
Institute of Hydrology and Water Resources
Zhejiang University
Hangzhou, China
Di Ma
Institute of Hydrology and Water Resources
Zhejiang University
Hangzhou, China
Paul Muñoz
Department of Water Resources and Environmental Sciences, and Faculty of Engineering
University of Cuenca
Cuenca, Ecuador
Suli Pan
Institute of Hydrology and Water Resources
Zhejiang University
Hangzhou, China
Tianrui Pang
State Key Laboratory of Urban Water Resource and Environment
School of Environment
Harbin Institute of Technology
Harbin, China
Fidel Perez
Mother and Teacher Pontifical Catholic University
Santo Domingo, Dominican Republic
Ioana Popescu
Department of Hydroinformatics and Socio‐Technical Innovation
IHE Delft Institute for Water Education
Delft, The Netherlands
Vidya S. Samadi
Department of Agricultural Sciences
Clemson University
Clemson, South Carolina, USA
Germán Santos
Colombian School of Engineering Julio Garavito
Bogotá, Colombia
Bellie Sivakumar
Department of Civil Engineering
Indian Institute of Technology Bombay
Mumbai, India
Dimitri P. Solomatine
IHE Delft Institute for Water Education
Delft, The Netherlands; and
Water Resources Section
Delft University of Technology
Delft, The Netherlands; and
Water Problems Institute of the Russian Academy of Sciences
Moscow, Russia
Eric D. Swain
Caribbean‐Florida Water Science Center
United States Geological Survey
Lutz, Florida, USA
Sadgeh Sadeghi Tabas
Department of Civil Engineering
Clemson University
Clemson, South Carolina, USA
Carlos Tami
Colombian School of Engineering Julio Garavito
Bogotá, Colombia
Sijie Tang
School of Environmental Science and Engineering
Southern University of Science and Technology
Shenzhen, China
Ioannis Tsoukalas
Department of Water Resources and Environmental Engineering
National Technical University of Athens
Zografou, Greece
Jose Valles
IHE Delft Institute for Water Education
Delft, The Netherlands
Henny A. J. Van Lanen
Hydrology and Quantitative Water Management Group
Wageningen University
Wageningen, The Netherlands
Daniel A. Vázquez
IHE Delft Institute for Water Education
Delft, The Netherlands
Han Wang
China Institute of Water Resources and Hydropower Research
Beijing, China
Catherine A. M. E. Wilson
Hydro‐environmental Research Centre
School of Engineering
Cardiff University
Cardiff, United Kingdom
Na Wu
College of Environmental Science and Engineering
Tongji University
Shanghai, China
Jingkai Xie
Institute of Hydrology and Water Resources
Zhejiang University
Hangzhou, China
Yue‐Ping Xu
Institute of Hydrology and Water Resources
Zhejiang University
Hangzhou, China
Yunqing Xuan
Zienkiewicz Centre for Computational Engineering
Swansea University
Swansea, United Kingdom
Yi Zheng
School of Environmental Science and Engineering
Southern University of Science and Technology
Shenzhen, China
Xingyu Zhou
Shanghai Investigate, Design & Research Institute Co., Ltd.
Shanghai, China
Hydroinformatics deals with advanced information technology, data analytics, modeling, artificial intelligence (AI), and optimization applied to problems of aquatic environments for the purpose of informing management. Many of these technologies have become standard tools that support water management decisions around the world. However, the technologies continue to develop and new ones are emerging, which allows them to be applied to more complex and interesting problems. There are multiple examples of environmental and hydrological problems being dealt with not only by employing physically based (process) models but also with advanced data analysis tools and machine learning models.
The rapid development of machine learning and AI brings new possibilities for hydroinformatics research and practice. Nowadays, complex issues are analyzed by identifying and explaining patterns and anomalies of measured or simulated data. With an increasing amount of data collected about the environment, physically based models are increasingly complemented (and sometimes even replaced) by data‐driven models. Although data‐driven models lack the ability of physically based models to explain the physics of underlying processes, they are able to discover the hidden patterns in data and often can be more accurate in forecasting. Thus, they play an important role. Pattern recognition has been one of the main tasks solved by machine learning, and lately has been given an additional push by the development and use of deep learning, an important class of machine learning algorithms and of AI in general.
Data analytics plays an important role in water resources when data are multidimensional, and spatial and time dimensions have to be dealt with in a coordinated fashion. In relation to water resources, both dimensions were always important, but recently the need to handle huge amounts of remote sensing data (big data) has become more pronounced. These developments have motivated new research efforts in the context of predicting hydrological extremes and called for testing novel approaches of spatiotemporal data analysis and machine learning. In many problems, machine learning applied to hydrological extremes is studied in connection with uncertainties.
An issue in water resources management is optimal planning and operation under uncertainties, and this is where the role of AI‐driven approaches is also becoming more important. A traditional optimization approach typically cannot help much since such optimization is model based and objective functions cannot be analytically expressed. Algorithms developed under the framework of computational intelligence have been the focus of hydroinformatics for three decades but the new problems and the increased data availability lead to the necessity of testing new approaches and their critical analysis.
This book presents research results and experiences of applying hydroinformatics and, in particular, artificial intelligence and optimization technologies for water‐related problems. It targets hydrologists, water resources engineers, modelers, forecasters, and hydroinformatics specialists interested in the latest experiences of applying machine learning and optimization techniques to various water resources problems.
The chapters in the book are grouped into three sections: modeling hydrological systems, forecasting water resources, and knowledge discovery and optimization.
Part I deals with advances in modeling hydrological systems concentrating on distributed representations. Chapters consider model identifiability by driving calibration with stochastic inputs, use of heterogeneous precipitation data sources, fuzzy committees of distributed models, and using machine learning to represent dynamics of rainfall runoff.
Part II covers various aspects of forecasting water resources, which is an area of great importance for planning and providing warnings in case of extreme events. The main challenge here is to ensure an extended horizon of forecasts considering the associated uncertainties. Adequate incorporation of physics in machine learning models via variable selection process is the key in ensuring interpretability and acceptance of the final results. Different types of data analysis methods and models are presented from wavelet decomposition and noise filter to water level and flow predictions using deep learning. Well‐established (and accurate) neural networks (multilayer perceptrons) are not forgotten, as demonstrated by an example of the operational hydrological forecasting system used in El Salvador.
Part III covers relatively new areas of knowledge discovery and optimization based on using machine learning techniques. The chapters demonstrate how new vast data sources (big data) ranging from modeling results to spatiotemporal data analytics to social media provide new opportunities for discovering unseen patterns important for deeper understanding of water‐related processes. One chapter demonstrates how information from news media about water extremes combined with sentiment analysis enables the discovery of unexpected knowledge patterns. Another chapter shows how drought analysis from a 3‐D perspective of clustering extreme events can be used to characterize how phenomena develop in space and time. A further contribution demonstrates how so‐called complex networks are helping in discovering teleconnection patterns in water quality dynamics. Water management decisions are becoming more and more challenging, thus ideal for developing and testing various model‐based multi‐objective optimization approaches under uncertainty, an issue also dealt with in this part of the book.
The research results and experiences presented here demonstrate how machine learning and, more generally, artificial intelligence, can advance data analytics, accuracy of modeling and forecasting, and knowledge discovery for better water management under uncertainty. We hope this book will also provide inspiration for further advancing research in hydroinformatics.
We would like to acknowledge the valuable contributions of all the authors as well as their dedication and patience in multiple interactions with reviewers, whom we would also like to thank. This book includes contributions from researchers from many countries around the world, who present a broad scope of interesting problems of different geographical and hydrometeorological nature and scale. We are also thankful to our home, IHE Delft Institute for Water Education in the Netherlands, where hydroinformatics began 35 years ago, and where a number of our contributors studied. Our gratitude also goes to staff at AGU and Wiley who have played an important role by continuously supporting the preparation of this book.
Gerald A. Corzo PerezIHE Delft Institute for Water EducationThe NetherlandsDimitri P. SolomatineIHE Delft Institute for Water Education, andDelft University of TechnologyThe Netherlands
Gerald A. Corzo Perez1 and Dimitri P. Solomatine1,2,3
1IHE Delft Institute for Water Education, Delft, The Netherlands
2Water Resources Section, Delft University of Technology, Delft, The Netherlands
3Water Problems Institute of the Russian Academy of Sciences, Moscow, Russia
In recent years, there has been a surge of interest in machine learning (ML) and artificial intelligence (AI) due to the effectiveness of deep learning algorithms and the increasing availability of large data sets. This chapter provides a brief overview of the applications of AI and ML techniques in hydroinformatics, a field that deals with advanced information technology, data analytics, and modeling for aquatic environment management. Data‐driven models are becoming more common in water management as they can reveal hidden patterns in data and offer improved accuracy in certain situations. This chapter highlights the importance of spatiotemporal data analysis, pattern recognition, and optimization approaches in water resources management under uncertainty. It does not offer a comprehensive review of all methods but rather focuses on selected ML techniques widely used in water‐related problems. Additionally, the chapter discusses the challenges associated with using ML models, such as black‐box criticisms, and the potential of hybrid models that combine the strengths of ML and physically based process models for more robust solutions in hydroinformatics.
Hydroinformatics deals with advanced information technology, data analytics, modeling, artificial intelligence (AI), and optimization applied to problems of aquatic environment for the purpose of informing management. Many of these technologies have become standard tools that support water management decisions around the world. However, the technologies are developing further, new ones are emerging, and this allows for applying them to more complex and interesting problems. One can find multiple examples when environmental and hydrological problems have been dealt with not only by employing physically based (process) models, but also advanced data analysis tools and machine learning models have been used. Using AI techniques in geosciences has a long history. Hydroinformatics, formulated by Abbott (1991) 30 yr ago, has been defined as a union of computational hydraulics (CH) and AI (so that HI = CH ∪ AI), and during the last three decades we have been witnessing a much wider use of AI, with a large number of successful practical applications. The first stage of such development has been covered, for example, in the edited volume Practical hydroinformatics: Computational intelligence and technological developments in water applications (Abrahart et al., 2008), and in dozens of other books and hundreds of research papers covering these new developments.
Currently, we see a new wave of interest in machine learning (ML) and AI, which is partly explained by the demonstrable effectiveness of the new generation of deep learning algorithms and availability of large data sets (see, e.g., Nearing et al., 2021), and this brings new possibilities for hydroinformatics research and practice. With an increasing amount of data collected about the environment, physically based models are more and more complemented and sometimes even replaced by data‐driven models. Lacking the ability of physically based models to explain the physics of underlying processes, data‐driven models are however able to discover the hidden patterns in data and often can be more accurate, and play an important supporting role, in water management. Pattern recognition (e.g., automatic identification of flooded areas on satellite images) has been one of the main tasks solved by machine learning, and lately has been given an additional push by the development and use of deep learning, an important class of machine learning algorithms, and of AI in general. Data analytics plays an important role in water resources when data are multidimensional, and spatial and time dimensions have to be dealt with in a coordinated fashion. In relation to water resources, both dimensions were always important, but recently the need to handle huge amounts of remote sensing data (“big data”) has become more pronounced. These developments have motivated new research efforts in the context of predicting hydrological extremes and call for testing novel approaches of spatiotemporal data analysis and machine learning. Due to much easier access to supercomputing facilities, there are increased possibilities to study the models uncertainty (typically using Monte Carlo frameworks), and machine learning can also play a role in building predictive models of such uncertainties. An issue in water resources management is optimal planning and operation under uncertainties, and this is where the role of AI‐driven approaches is also becoming more important. Classical optimization approaches (gradient‐based nonlinear optimization) typically cannot help much, since such optimization is model based, and objective functions (and their gradients) cannot be analytically expressed. Optimization approaches developed under the framework of computational intelligence (various types of randomized search, e.g. evolutionary approaches) have been the focus of hydroinformatics for three decades, but the new problems and the increased data availability lead to the necessity of testing new approaches and their critical analysis.
This chapter aims at presenting a brief overview of AI‐ and ML‐related building processes and methods widely used for water‐related problems, in the context of the chapters presented in this volume. AI is a concept that covers a wide area of science and technology, however, quite often it is used interchangeably with ML, which is in fact a narrower notion. One may find in literature quite a large number of AI‐ and ML‐related subareas: big data, data mining, pattern recognition (PR), natural language processing (NLP), neural networks, deep learning, and so on. We will not go into a discussion about terminology and differences in AI and ML; for the purpose of this chapter and the issues covered in the book, it would be right to use a somewhat narrower term, that is, machine learning.
ML techniques have been widely used in water resources during the last decades, however, at the same time, one may observe also inadequate use of ML‐related modeling procedures, unjustified selection of algorithms, and even lack of understanding of why a model provides good or poor performance in mathematical and statistical sense. There is also well‐known criticism of ML and statistical techniques by practitioners who are used to employing physically based (process) models; they are pointing out that a water resources problem interpretation is hidden in the so‐called black box of a ML model. There is indeed a challenge of posing the problem in the right way: how domain knowledge can drive selection, building, and tuning a ML model. Lack of data and its uncertainty also makes it difficult for practitioners to feel confident about ML models.
On the other hand, the strength of ML is in its ability to represent the relationships between inputs and outputs, provided enough data are available. Although the relatively recent advances in deep learning have opened the door to the new ways of using spatiotemporal data, and at the same time motivating new algorithm developments from spatial patterns and, in general, all types of computer vision algorithms, not all problems can be tackled by ML. Input and output relations can be so complex that ML techniques may not be able to find the hidden patterns, and in such cases hybrid models, combining power of ML and process models (so‐called physics‐aware AI; see, e.g., Jiang et al., 2020) would be needed. Such hybrid approaches are given now increased attention in hydroinformatics.
This chapter is not intended to provide a comprehensive review of methods (which are covered in hundreds of books and in the referred literature herein), but rather focuses on some important elements of ML model building, and presents basics of several selected ML techniques quite widely used in solving water‐related problems, allowing for “feeling the flavor” of ML.
There is a large number of evolving definitions of AI, and this can be explained by its permanent evolution and shifts in priorities, and the advances in the used mathematical instruments. Many literature sources point out that for the first time the term AI was used in 1956 at the Dartmouth Conference, were John McCarthy, Alan Turing, and other founding fathers of AI, help to coin the term artificial intelligence. One of the definitions reads: “AI is the field devoted to building artificial animals (or at least artificial creatures that, in suitable contexts, appear to be animals) and, for many, artificial persons (or at least artificial creatures that, in suitable contexts, appear to be persons)” (Stanford Encyclopedia of Philosophy, 2018). On the other hand, Wikipedia defines it as the “intelligence demonstrated by machines, unlike the natural intelligence displayed by humans and animals” (Artificial Intelligence, 2022). Yet another definition (sometimes referred to as being given by IBM) states, that “AI leverages computers and machines to mimic the problem‐solving and decision‐making capabilities of the human mind.” All these definitions differ in details, but are very similar in the main idea: a machine (a programmed computer) is supposed to imitate some behavior of a living creature.
An old debate regarding whether humans will be replaced by machines has been reinitiated in various public media in the view of the latest developments in AI, especially generative AI, as implemented, for example in platforms like ChatGPT. Indeed, AI has evolved into different types, related to an extent to which it may take over some of humans' activities. The first ideas of what could be achieved are purely reactive, which is highly related to the beginnings of computer science, where, AI does not have any memory, which basically means no initial data base or information of processes. This concept can be applied to solving narrow specialized tasks. For example, a forecast is performed based only on the current situation, limited historical samples, and known variables. Further development can lead to building up memory, by collecting previous experience and more complex and voluminous data and continue adding it to the memory. Such AI systems have enough memory or experience to support humans in performing various tasks, but their ability is still limited and they are still seen as a helping hand. For example, it can provide adaptive forecasts depending on the context, such as previous performance, climatic conditions, type of a river basin, and others. An even higher level of AI can be explained as a theory of mind (Premack & Woodruff, 1978) where AI can understand thoughts and emotions and interact socially. This type of concept needs an integration of many components of AI, development of more sophisticated mathematical apparatus, so such developments are still at a rudimentary level. At the top level, it is possible to consider how these systems can become aware of life and even become self‐aware. This concept links to the idea that AI machines can create new knowledge and, at the same time, build internal system concepts that link intelligence, sentience, and consciousness.
Advances of AI have been numerous and applied in various areas. We should admit however, that in water resources, only a few of such developments have been used, and these relate to application of specific machine learning techniques.
Figure 1.1 presents a schematization of some of the key techniques of ML, with references to decades when these methods started to develop. Due to a wide application of ANN, this architecture is presented in more detail. One of the relatively new developments is natural language processing (NLP); it uses deep learning (DL) to train models that help interpret text and reinforcement learning concepts to use DL. Pattern recognition uses convolutions and DL, which develop pattern recognition to extract features. Finally, metaheuristics provide the basis for new models of DL. The following are some of the concepts used in this chapter:
Figure 1.1 Evolution of AI topics: From artificial intelligence to machine learning and deep learning in hydroinformatics.
ML (machine learning). Mathematical models that aim to represent groups and/or input‐output relationships from data
NLP (natural language processing). The use of language elements, in general, text encoded into numbers and its analysis, mainly from the transformation of text and processing it to solve, replicate semantics, and understand them
Pattern recognition. ML can be characterized as a subarea that explores how data and their attributes (variables or features) can be detected. Many ML algorithms do implicitly detect patterns and therefore these areas are interrelated. Computer vision is an important area of their application. It is worth noting that a number of important pattern recognition mathematical apparatus and algorithms are not explicitly positioned in the machine learning realm, for example, procedures of denoising and filtering, segmentation of images, 3D virtual reality patterns, vector fields flow, but they for sure contribute to solving the pattern recognition problems.
There are various ways of contextualizing ML. From the perspective of computer science, the concept of ML can be seen as aiming at changing the programming paradigm (Fig. 1.2). Aim here is to develop a computer program that will not require significant analysis to understand how to create an algorithm to obtain certain responses; instead, a ML algorithm, theoretically, can learn from inputs and responses (outputs).
In many applications, however, ML is not seen as a tool to generate computer programs, but instead is expected to help in building input‐output models by learning from data, in other words, data‐driven models (see Fig. 1.6). Their use is quite varied and most of the time is justified by the idea that a system might be very complex and we may not observe all the internal states of a modeled system or process (e.g., in hydrological modeling this may be soil moisture). This implies that if we have a complex system, with only a limited understanding of the driving variables of a natural process (or any process in general), and we can measure the consequences of events (i.e., outputs resulting from particular inputs), then with this information, it is possible to generate ML models (Fig. 1.3).
Figure 1.2 Differences between traditional programming and ML, as seen in computer science: (a) Computer science algorithm development; (b) Machine Learning algorithm development.
In most cases, the ML engines (e.g., artificial neural networks) work with numerical (real‐valued) data, so are, in fact, nonlinear regression models. If data are nonnumerical (e.g., classes, images, or words), they have to be first transformed (encoded) into numerical form, and then processed.
In relation to Earth sciences, wide adoption of ML has not been fast, to say the least, since many scientists were pointing out that there is no clear justification for using these algorithms. Their reasoning was that the models, as descriptors of reality, should be based on scientific understanding of processes (e.g., physics), and not on a statistical encapsulation of data sets. Water resources are not an exception in this sense, and early applications of ML have been criticized, as they end up reproducing natural problems that do not need to be reproduced abstractly with ML, which was arguably resulting often in building a blind representation of a well‐known problem. However, during the last two to three decades, there have been many examples of successful applications of ML reported and implemented in decision support systems. It has been shown that ML methods are often more accurate than the traditional hydrologic models in forecasting (see, e.g., Nearing et al., 2021; Arsenault et al., 2023). ML also helps to replace complex slow‐running physically based models: a ML model is trained on data generated by a process model, and such fast metamodel (surrogate) would replace a much slower process model in operational systems and therefore be used in real‐time forecasting to provide warnings in an efficient manner. ML‐based pattern recognition algorithms can also help to automate the detection of critical scenarios, combining variables that might not be easily related physically and capturing nontrivial relationships and patterns implicitly present in data, reproducing thus complex phenomena. Therefore, ML has become a powerful analytical and predictive tool.
Figure 1.3 Encapsulation of natural systems and processes in ML models, with feedbacks.
Aside from ML, there are other areas in AI worth attention, that is, natural language processing and metaheuristics, and they are also considered due to their potential for water resources management.
The Internet has allowed us to arrange access to billions of documents, images, and audio and video material in very different areas of human activities. It would be interesting to understand if and how we can use these data for solving water resources problems. Data on the Internet are often not structured and linked to a variety of sources, from websites of organizations, to social media, news, blogs, videos, and more. In many cases useful data are presented as text. The idea of text mining is not new; however, the large amount of data available on the Internet, in the form of text format, has generated a boom in developing intelligent tools, referred to as natural language processing (NLP). NLP can be defined as the ability of a computer program to understand human language as it is spoken and written, that is natural language (Sun et al., 2022). It starts with the idea of processing text and develops ways to interpret and reproduce it. The ways of understanding how we write have been formalized in tools for sentiment analysis of text, generation of text, correction of text, text extraction, and concept of artificial assistants.
NLP converts letters, words, and phrases into numerical representation. This is done sometimes in simple terms, like numbering each word in a phrase and repeating the number when the word repeats itself. However, the results of this numerical representation need to follow the basics of the language (Khurana et al., 2023). Therefore, typically, the process of interpreting the language focuses on five steps:
Lexical (morphological) analysis: In essence, it is breaking text into paragraphs, phrases, and words. Furthermore, it is possible to understand at the level of individual words, the morphemes as the smallest units of a word. Last, lexical analysis identifies the morphemes and allows us to characterize the word and understand its meaning knowing its root form. The final objective of this step is to help to identify words, which are normally referred to as tokens, since the original word in fact possesses some information, and, for programming, it is a sequence of characters, which represent a unit of information.
Syntax analysis: it allows for checking the grammar and with this the way words are arranged in a sentence. As a consequence, this order allows us to find how words should be normally arranged. Using this information, it is possible to build relationships between them. Knowing this, it is possible to assess the parts of a sentence (POS) and tag this information based on the structure found.
Semantic analysis: This step aims at finding the meaning of the statement, how the phrase reads literally. This understanding provides the basis for rejecting syntactically valid, but illogical statements.
Discourse integration: The context in which a phrase is used can be very important, so this step aims at establishing links between the different sentences, especially the immediately preceding one.
Pragmatic analysis, this concept uses a set of rules that describe cooperative dialogs, as in social content. What can be found in social media and common interactions can become a rule and with this we can comprehend the way the communication takes place.