Advanced Hydroinformatics -  - E-Book

Advanced Hydroinformatics E-Book

0,0
181,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

Applying machine learning and optimization technologies to water management problems The rapid development of machine learning brings new possibilities for hydroinformatics research and practice with its ability to handle big data sets, identify patterns and anomalies in data, and provide more accurate forecasts. Advanced Hydroinformatics: Machine Learning and Optimization for Water Resources presents both original research and practical examples that demonstrate how machine learning can advance data analytics, accuracy of modeling and forecasting, and knowledge discovery for better water management. Volume Highlights Include: * Overview of the application of artificial intelligence and machine learning techniques in hydroinformatics * Advances in modeling hydrological systems * Different data analysis methods and models for forecasting water resources * New areas of knowledge discovery and optimization based on using machine learning techniques * Case studies from North America, South America, the Caribbean, Europe, and Asia The American Geophysical Union promotes discovery in Earth and space science for the benefit of humanity. Its publications disseminate scientific knowledge and provide resources for researchers, students, and professionals.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 659

Veröffentlichungsjahr: 2023

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

COVER

TABLE OF CONTENTS

TITLE PAGE

COPYRIGHT

LIST OF CONTRIBUTORS

PREFACE

1 HYDROINFORMATICS AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING IN WATER‐RELATED PROBLEMS

1.1. Introduction

1.2. Key Principles of ML/Hydroinformatics

1.3. Model Building and Input Variable Selection in Machine Learning for Water‐related Problems

1.4. Advanced Techniques in Machine Learning for Water Resources Applications

1.5. Future Directions and Challenges

References

PART I: MODELING HYDROLOGICAL SYSTEMS

2 IMPROVING MODEL IDENTIFIABILITY BY DRIVING CALIBRATION WITH STOCHASTIC INPUTS

2.1. Introduction

2.2. Methodology

2.3. Insights to the Stochastic Simulation Procedure

2.4. Proof of Concept

2.5. Concluding Remarks

Acknowledgments

References

3 A TWO‐STAGE SURROGATE‐BASED PARAMETER CALIBRATION FRAMEWORK FOR A COMPLEX DISTRIBUTED HYDROLOGICAL MODEL

3.1. Introduction

3.2. Method

3.3. Study Area and Data

3.4. Experimental Setup

3.5. Results

3.6. Discussion

3.7. Conclusions

Acknowledgments

References

4 FUZZY COMMITTEES OF CONCEPTUAL DISTRIBUTED MODEL

4.1. Introduction

4.2. Case Study

4.3. Methodology

4.4. Results and Discussion

Acknowledgments

References

5 REGRESSION‐BASED MACHINE LEARNING APPROACHES FOR DAILY STREAMFLOW MODELING

5.1. Introduction

5.2. Materials and Methods

5.3. Results

5.4. Conclusion and Future Works

Acknowledgments

References

6 USE OF NEAR‐REAL‐TIME SATELLITE PRECIPITATION DATA AND MACHINE LEARNING TO IMPROVE EXTREME RUNOFF MODELING

6.1. Introduction

6.2. Study Area and Data Set

6.3. Methodology

6.4. Results

6.5. Discussion

6.6. Conclusions

References

PART II: FORECASTING WATER RESOURCES

7 FORECASTING WATER LEVELS USING MACHINE (DEEP) LEARNING TO COMPLEMENT NUMERICAL MODELING IN THE SOUTHERN EVERGLADES, USA

7.1. Introduction

7.2. Methods

7.3. Workflow

7.4. Experimental Setup

7.5. Comparison of Measured Rainfall and NWP Rainfall Forecasts

7.6. Results and Discussion

7.7. Conclusions and Future Directions

Disclaimer

Appendix

References

8 APPLICATION OF A MULTILAYER PERCEPTRON ARTIFICIAL NEURAL NETWORK (MLP‐ANN) IN HYDROLOGICAL FORECASTING IN EL SALVADOR

8.1. Introduction

8.2. Data‐Driven Modeling Technique

8.3. Characterization of The Grande De San Miguel Catchment

8.4. Methodology

8.5. Experimental Setups

8.6. Results and Discussion

8.7. Operational MLP‐ANN Forecast Model

8.8. Conclusions and Recommendations

Acknowledgments

References

9 NOISE FILTER WITH WAVELET ANALYSIS IN ARTIFICIAL NEURAL NETWORKS (NOWANN) FOR FLOW TIME SERIES PREDICTION

9.1. Introduction

9.2. Materials

9.3. Methodology

9.4. Case Study

9.5. Results and Discussions

9.6. Conclusions

References

PART III: KNOWLEDGE DISCOVERY AND OPTIMIZATION

10 APPLICATION OF NATURAL LANGUAGE PROCESSING TO IDENTIFY EXTREME HYDROMETEOROLOGICAL EVENTS IN DIGITAL NEWS MEDIA: CASE OF THE MAGDALENA RIVER BASIN, COLOMBIA

10.1. Introduction

10.2. Case Study

10.3. Methodology

10.4. Results

10.5. Conclusions

References

11 THREE‐DIMENSIONAL CLUSTERING IN THE CHARACTERIZATION OF SPATIOTEMPORAL DROUGHT DYNAMICS: CLUSTER SIZE FILTER AND DROUGHT INDICATOR THRESHOLD OPTIMIZATION

11.1. Introduction

11.2. Methods and Data

11.3. Results and Discussion

11.4. Summary and Conclusions

Acknowledgments

References

12 DEEP LEARNING OF EXTREME RAINFALL PATTERNS USING ENHANCED SPATIAL RANDOM SAMPLING WITH PATTERN RECOGNITION

12.1. Introduction

12.2. Toolbox

12.3. Identify the Specific Patterns of Rainfall in Great Britain

12.4. Generating the Training Sets for Deep Learning Studies

12.5. Conclusion

References

13 TELECONNECTION PATTERNS OF RIVER WATER QUALITY DYNAMICS BASED ON COMPLEX NETWORK ANALYSIS

13.1. Introduction

13.2. Study Areas and Methods

13.3. Theory of Complex Network

13.4. Results and Discussion

13.5. Comparison of Teleconnection Features Between Scales and Parameters

13.6. Study Limitations and Future Works

13.7. Conclusion

Acknowledgments

References

14 PROBABILISTIC ANALYSIS OF FLOOD STORAGE AREAS MANAGEMENT IN THE HUAI RIVER BASIN, CHINA, WITH ROBUST OPTIMIZATION AND SIMILARITY‐BASED SELECTION FOR REAL‐TIME OPERATION

14.1. Introduction

14.2. Materials and Methods

14.3. Results and Discussion

14.4. Conclusions

References

15 MULTI‐OBJECTIVE OPTIMIZATION OF RESERVOIR OPERATION POLICIES USING MACHINE LEARNING MODELS: A CASE STUDY OF THE HATILLO RESERVOIR IN THE DOMINICAN REPUBLIC

15.1. Introduction

15.2. Case Study

15.3. Methodology

15.4. Results

15.5. Conclusions

Acknowledgments

References

INDEX

END USER LICENSE AGREEMENT

List of Tables

Chapter 1

Table 1.1 Example of the input matrix generation for a rainfall‐runoff model...

Chapter 2

Table 2.1 Key summary statistical characteristics of observed rainfall and r...

Chapter 3

Table 3.1 Basic information about the study area.

Table 3.2 Ranges of VIC calibration parameters.

Table 3.3 Benchmark values and optimal solutions of the surrogate model.

Table 3.4 Crucial ranges of NSE and Bias in each gauge station.

Table 3.5 Evaluations of simulation with the parameter sets given by VIC and...

Table 3.6 Evaluations of simulation with the parameter sets given by VIC and...

Table 3.7 Time consumed by multi‐objective optimization algorithms in the LJ...

Table 3.8 Time consumed by multi‐objective optimization algorithms in the XJ...

Chapter 4

Table 4.1 Weighting schemes for root mean square error for high (RMSE

hf

) and...

Table 4.2 Performance of committee models with a different combination of we...

Table 4.3 Committee of conceptual distributed models' performance.

Chapter 5

Table 5.1 Performance metrics of the ML and SAC‐SMA models.

Chapter 6

Table 6.1 Search space (grid) of the RF runoff models.

Table 6.2 RF hyperparameterization of extreme runoff models.

Table 6.3 Number of events and efficiencies on test subsets of runoff models...

Chapter 7

Table 7.1 Root mean squared error of example model.

Table 7.2 Details for short‐term and medium‐range forecast horizons.

Table 7.3 Measured and forecasted rainfall and water level.

Table 7.A1 Performance indicators for each station specific TDNN model inves...

Table 7.A2 Performance indicators for each station specific LSTM model inves...

Chapter 8

Table 8.1 Potential evapotranspiration‐elevation relationships.

Table 8.2 MLP‐ANNs experimental summary for flow forecasting in Grande de Sa...

Table 8.3 Statistical properties for training, cross‐validation, and verific...

Table 8.4 Statistical properties for training, cross‐validation, and verific...

Table 8.5 Summary of goodness‐of‐fit function for MLP‐ANN and Naïve forecast...

Table 8.6 Number of hidden nodes or neurons in the MLP‐ANN forecast models....

Chapter 9

Table 9.1 Bibliographic reference summary used for this work.

Table 9.2 Description of the case study analyzed.

Table 9.3 Best fold among all HNs: ANN stand‐alone model; best fold among al...

Table 9.4 WM5P model results; WANN model results.

Chapter 10

Table 10.1 Keywords defined for the news search.

Table 10.2 Selected newspaper sources grouped by region.

Table 10.3 Correlation of monthly sentiment index and monthly climatic data....

Table 10.4 Confusion matrix results from Ayapel Swamp, showing high and low ...

Table 10.5 Confusion matrix results from Betania Reservoir, showing high and...

Chapter 11

Table 11.1 Angle C (Degrees) for each Drought Indicator (DI) threshold and c...

Chapter 12

Table 12.1 Specific patterns of rainfall in three countries cataloged at thr...

Table 12.2 Temporal change of specific rainfall patterns in three countries ...

Chapter 14

Table 14.1 Historical most severe flood records at Lutaizi station and calcu...

Table 14.2 Robust strategies for different downstream risk levels.

Table 14.3 Similar hydrographs chosen by four criteria.

Chapter 15

Table 15.1 Performance metrics for test runs of the ML models.

Table 15.2 Summary of the parameters and configuration of the ML models.

Table 15.3 Parameters of the optimization process.

Table 15.4 Hypervolume indicator for optimized models.

List of Illustrations

Chapter 1

Figure 1.1 Evolution of AI topics: From artificial intelligence to machine l...

Figure 1.2 Differences between traditional programming and ML, as seen in co...

Figure 1.3 Encapsulation of natural systems and processes in ML models, with...

Figure 1.4 Interactions of digital information sources and citizens, and the...

Figure 1.5 A typical sequence of steps in Machine Learning Modeling Framewor...

Figure 1.6 Training a Machine Learning model as an iterative optimization pr...

Figure 1.7 Autocorrelation for the discharge at Ourthe River basin (tributar...

Figure 1.8 Possible cycles in ML model building.

Figure 1.9 A typical neuron and a single‐output multilayer perceptron (MLP A...

Figure 1.10 Training M5 model trees and their operation on a new unseen inst...

Figure 1.11 Recurrent neural network. Inputs from a time series (X) are fed ...

Chapter 2

Figure 2.1 The classical calibration paradigm, based on the split‐sample app...

Figure 2.2 Dependency patterns among runoff data (a) between subsequent mont...

Figure 2.3 Synthetic runoff data of 2,000 yr length at Evinos basin, aggrega...

Figure 2.4 Schematic view of the conceptual model and the associated fluxes ...

Figure 2.5 Comparison of model fitting (i.e., simulated vs. observed runoff ...

Figure 2.6 Comparison of model fitting (i.e., simulated vs. observed runoff ...

Figure 2.7 Empirical histograms of model parameters obtained with the GLUE m...

Chapter 3

Figure 3.1 Framework of the multi‐objective surrogate‐based parameter calibr...

Figure 3.2 Flowchart of the initial surrogate model training.

Figure 3.3 Location of (a) the Lanjiang River basin and (b) the Xiangjiang R...

Figure 3.4 Multi‐objective optimization procedure of the benchmark test in t...

Figure 3.5 Trace plots of the seven calibrated parameters. The All populatio...

Figure 3.6 Population values with the increasing generations of (a)

NSE

at L...

Figure 3.7 Population values with the increasing generations of (a)

NSE

at H...

Figure 3.8 Two‐dimensional Pareto plots for

Bias

versus

NSE

in the LJR basin...

Figure 3.9 Two‐dimensional Pareto plots for

Bias

versus

NSE

in the XJR basin...

Figure 3.10 Compromise parameters of the surrogate model and VIC in (a) the ...

Figure 3.11 Simulated daily streamflow series and observed streamflow (a) at...

Figure 3.12 Simulated daily streamflow series and observed streamflow (a) at...

Figure 3.13 Spatial maps of annual average runoff depth (mm) in the LJR basi...

Figure 3.14 Spatial maps of annual average runoff depth (mm) in the XJR basi...

Chapter 4

Figure 4.1 The concept of building local models represents different flow re...

Figure 4.2 (a) Jiboa catchment stations and (b) model setup.

Figure 4.3 Proposed steps and approaches for improving committee models usin...

Figure 4.4 Model structure.

Figure 4.5 Weighting schemes for weighted root mean squared error as an obje...

Figure 4.6 Fuzzy membership functions to combine specialized models: (a) MF‐...

Figure 4.7 Experimental models setup.

Figure 4.8 Weighting scheme parameters (gamma and delta). The first part of ...

Figure 4.9 The relation between committee component and committee model perf...

Figure 4.10 Normalized parameters plot (parameters corresponding to Pareto o...

Figure 4.11 Performance graph of all committee models. Models are ordered fr...

Figure 4.12 Flow hydrograph result from committee model Lumped‐MF‐A.

Figure 4.13 Performance of committee models at the highest flood event in th...

Figure 4.14 Normalized performance criteria of all models for the visual com...

Figure 4.15 Pareto optimal sets of specialized models result from the multio...

Chapter 5

Figure 5.1 A conceptual framework of the proposed modeling workflow illustra...

Figure 5.2 Observed versus simulated daily streamflow of the ML methods as w...

Figure 5.3 FDCs that display modeled and observed streamflow values against ...

Chapter 6

Figure 6.1 The Jubones basin in the tropical Andes of Ecuador, South America...

Figure 6.2 Mean annual precipitation in mm (for 2019 and 2020) measured by t...

Figure 6.3 (a) Runoff and precipitation (PERSIANN‐CCS) time series at the ou...

Figure 6.4 Precipitation identification with an object‐based Connected Compo...

Figure 6.5 Illustration of the precipitation‐retrieval modular approach usin...

Figure 6.6 Meteorological precipitation information retrieved from 47 extrem...

Figure 6.7 Localization of precipitation object centroids (dots) associated ...

Figure 6.8 Precipitation classes associated with extreme hydrological events...

Figure 6.9 Scatter plot between extreme runoff observations and simulations ...

Chapter 7

Figure 7.1 (a) Historical flow through the Everglades before drainage and (b...

Figure 7.2 Map showing the study area in the southern Everglades, Florida, a...

Figure 7.3 Correlation analyses for the three stations

not

influenced by a c...

Figure 7.4 Correlation analyses for the stations influenced by a control str...

Figure 7.5 Averaged rainfall (mm) for the dry season and wet season in the s...

Figure 7.6 Charts comparing the Root Mean Squared Error of the water‐level (...

Figure 7.7 Short‐term forecasts for station NP201. (a) Rainfall used in the ...

Figure 7.8 Short‐term forecasts for station NP201. (a) Three‐year water‐leve...

Figure 7.9 Medium‐range forecasts for station NP201. (a) Rainfall used in th...

Figure 7.10 Medium‐range forecasts for station NP201. (a) Three‐year water‐l...

Figure 7.11 A 24 hr water‐level forecast comparison of ANN and BISECT model ...

Chapter 8

Figure 8.1 (a) Multilayer Perceptron Artificial Neural Network architecture ...

Figure 8.2 Rainfall and water level stations in Grande de San Miguel catchme...

Figure 8.3 Overall methodology to build an MLP‐ANN model.

Figure 8.4 Response time based on the physical characteristic of the catchme...

Figure 8.5 Correlation coefficient (a) and AMI (b) analysis between predicte...

Figure 8.6 (a) Scatter plot and (b) hydrograph comparison between observed a...

Figure 8.7 Relation between the model input and the resulting weights for ea...

Figure 8.8 Comparison between the observed discharge, the naïve model, and M...

Figure 8.9 Scatter plot between the observed and simulated MLP‐ANN discharge...

Figure 8.10 Model performance comparison between the naïve and MLP‐ANN forec...

Figure 8.11 Conceptual design of the operational MLP‐ANN forecasting in Gran...

Chapter 9

Figure 9.1 Boundary limits and traditional extension methods. (a) Wavelet lo...

Figure 9.2 Traditional ANN approach.

Figure 9.3 WANN approach. In order to create this model, we use the multires...

Figure 9.4 Soft and hard threshold for de‐noising the signal source (http://...

Figure 9.5 De‐noising process in the details of the signal.

Figure 9.6 Pseudocode for the ANN exhaustive.

Figure 9.7 All 10‐fold's RMSE results for HN = 20.

Figure 9.8 Best fold's RMSE results per HN.

Figure 9.9 Pseudocode for the WANN exhaustive.

Figure 9.10 Exhaustive search RMSE‐train.

Figure 9.11 Exhaustive search RMSE‐test.

Figure 9.12 Exhaustive search RMSE‐val.

Figure 9.13 Testing data set: Simulated (upper figure), residual errors (low...

Figure 9.14 Data‐driven model with raw time series: Training data set, valid...

Figure 9.15 Data‐driven models with wavelets as preprocessor: Training data ...

Figure 9.16 Testing data set: Simulated discharge (upper) residual errors(lo...

Chapter 10

Figure 10.1 Magdalena River Basin scheme: Rivers and water bodies. WGS‐84. d...

Figure 10.2 Journal article shows the emergency (from Tiempo, 2018).

Figure 10.3 Methodology process.

Figure 10.4 Data‐mining process.

Figure 10.5 News‐filtering process.

Figure 10.6 Graphical representation of the CBOW model and the Skip‐gram mod...

Figure 10.7 Distribution of 207 water bodies.

Figure 10.8 Web‐scraping simplified process.

Figure 10.9 Example of information extracted from a newspaper website (https...

Figure 10.10 Distribution of journal articles (percentage).

Figure 10.11 Word cloud image for Quebrada el Carmen representing the news a...

Figure 10.12 Number of newspaper articles about the rivers composing the Mag...

Figure 10.13 Sentiment change over time: Hidroituango.

Figure 10.14 Monthly river‐level range, Puerto Valdivia station in the Cauca...

Figure 10.15 Monthly river‐level range, La Campina station (Continuous line)...

Figure 10.16 Spatial sentiment changes regarding the Magdalena River Basin, ...

Figure 10.17 Spatial comparison sentiment and climatological variables regar...

Chapter 11

Figure 11.1 Schematic overview of the methodology for 3‐D drought clusters c...

Figure 11.2 Scheme of the method to calculate the optimal cluster size filte...

Figure 11.3 Number of 3‐D clusters (nc) calculated for different drought ind...

Figure 11.4 As Figure 11.3 but for durations up to 20 months and sizes up to...

Figure 11.5 Number of 3‐D clusters (nc) calculated for different drought ind...

Figure 11.6 As Figure 11.5 but for durations up to 20 months and sizes up to...

Figure 11.7 Percentage of drought area calculated for each 3‐D cluster. Resu...

Figure 11.8 Identification of the optimal cleaning filter size. The result f...

Figure 11.9 (a) Duration (months) and magnitude (number of voxels) of the dr...

Figure 11.10 (a) Centroids of the 3‐D clusters for the period 1950–2017. Cen...

Figure 11.11 (a) Drought from May 1957 to September 1972. (b) Percentage of ...

Figure 11.12 (a) Drought from December 2006 to September 2017. (b) Percentag...

Figure 11.13 (a) Drought from October 1984 to September 1994. (b) Percentage...

Figure 11.14 Reported droughts in the EMDAT and CAZALAC (left). Percentage o...

Chapter 12

Figure 12.1 The new cluster analysis and pattern segmentation component desi...

Figure 12.2 The histograms of the number of daily rainfall pattern in Englan...

Figure 12.3 Attribute for labels (a) L1, (b) L2, and (c) L3 used in CNN.

Chapter 13

Figure 13.1 Study area and monitoring campaign in China and Huaihe River bas...

Figure 13.2 Different networks: (a) An undirected network with identical nod...

Figure 13.3 Topological structure of water quality monitoring networks in th...

Figure 13.4 Degree centrality pattern of water quality monitoring networks i...

Figure 13.5 Clustering coefficient pattern of water quality monitoring netwo...

Figure 13.6 Topological structure of water quality monitoring networks in Ch...

Figure 13.7 Degree centrality pattern of water quality monitoring networks i...

Figure 13.8 Clustering coefficient pattern of water quality monitoring netwo...

Chapter 14

Figure 14.1 (a) Huai River basin (adapted from Wu et al., 2015) and (b) sche...

Figure 14.2 Inflow hydrograph at Lutaizi station from 24 June to 7 September...

Figure 14.3 Schematic diagram of calculation for downstream risk.

Figure 14.4 Damage curves for four storage areas (Jonoski et al., 2019).

Figure 14.5 Example of Pareto front of the coupled simulated‐optimization mo...

Figure 14.6 Example result of optimal strategy operation (Jonoski et al., 20...

Figure 14.7 Hydrographs generated by Gamma‐like function of (a) one and (b) ...

Figure 14.8 One hundred sampled hydrographs using LHS.

Figure 14.9 Enlarged hydrographs using Homogeneous Multiple Enlargement.

Figure 14.10 PDFs of storage area damage under certain downstream risk.

Figure 14.11 Estimation of worst case and expected value of potential robust...

Figure 14.12 Test of robust strategies with enlarged hydrographs.

Figure 14.13 Comparison of tested and similar hydrographs.

Figure 14.14 Comparison of Pareto fronts of tested and similar hydrographs....

Figure 14.15 Test of real‐time strategies with three enlarged hydrographs.

Figure 14.16 Comparison of robust strategies and similarity selection method...

Chapter 15

Figure 15.1 Map of the Yuna River basin.

Figure 15.2 Floodplain in the lower Yuna basin.

Figure 15.3 Dam components: (a) location of the dam, (b) operation levels, (...

Figure 15.4 Real Data of the analysis period: (a) Inflows and outflows of th...

Figure 15.5 Flow diagram of the reservoir operation model built for the Hati...

Figure 15.6 Scheme of the proposed methodology.

Figure 15.7 Architecture of the ANN: (a) MLP, (b) RBN.

Figure 15.8 Correlation analyses for the selection of ML inputs: between inf...

Figure 15.9 Test run for 6‐7‐1 MLP configuration for simulation of real rese...

Figure 15.10 Configuration of the (a) MLP and (b) RBN proposed for the opera...

Figure 15.11 Demand for irrigation downstream of the reservoir.

Figure 15.12 Unified Pareto fronts: (a) All the operation models; (b) operat...

Figure 15.13 Envelope for hydrographs generated by all reservoir operations ...

Figure 15.14 Comparison of results of all the solutions to reservoir operati...

Figure 15.15 Parallel graphic: (a) Reservoir operations that improve the thr...

Figure 15.16 Pareto front for models with MLP and NSGA II optimizers.

Figure 15.17 Zoomed‐in Pareto front of the selected optimal reservoir operat...

Figure 15.18 Simulation of the selected reservoir operations for the analysi...

Figure 15.19 Hydrograph of the selected reservoir operations for the wet per...

Guide

Cover

Table of Contents

Title Page

Copyright

List of Contributors

Preface

Begin Reading

Index

End User License Agreement

Pages

iii

iv

vii

viii

ix

x

xi

xii

xiii

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

177

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

283

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

409

410

411

412

413

414

415

416

417

418

419

420

421

422

423

424

425

426

427

428

429

430

431

432

433

434

435

436

437

438

439

440

441

442

443

444

445

447

448

449

450

451

452

453

454

455

456

457

458

459

460

461

462

Special Publications 78

ADVANCED HYDROINFORMATICS

Machine Learning and Optimization for Water Resources

 

Gerald A. Corzo PerezDimitri P. SolomatineEditors

 

 

 

 

This Work is a co‐publication of the American Geophysical Union and John Wiley and Sons, Inc.

This edition first published 2024© 2024 American Geophysical Union

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.

Published under the aegis of the AGU Publications Committee

Matthew Giampoala, Vice President, PublicationsCarol Frost, Chair, Publications CommitteeFor details about the American Geophysical Union visit us at www.agu.org.

The right of Gerald A. Corzo Perez and Dimitri P. Solomatine to be identified as the editors of this work has been asserted in accordance with law.

Registered OfficeJohn Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA

Editorial Office111 River Street, Hoboken, NJ 07030, USA

For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.

Wiley also publishes its books in a variety of electronic formats and by print‐on‐demand. Some content that appears in standard print versions of this book may not be available in other formats.

Limit of Liability/Disclaimer of Warranty

While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

Library of Congress Cataloging‐in‐Publication Data

Names: Corzo Perez, Gerald Augusto, editor. | Solomatine, Dimitri P.,  editor. | John Wiley & Sons, publisher.Title: Advanced hydroinformatics : machine learning and optimization for  water resources / Gerald A. Corzo Perez, Dimitri P. Solomatine.Description: Hoboken, NJ : Wiley, 2024. | Includes index.Identifiers: LCCN 2023039032 (print) | LCCN 2023039033 (ebook) | ISBN9781119639312 (hardback) | ISBN 9781119639329 (adobe pdf) | ISBN  9781119639343 (epub)Subjects: LCSH: Hydrology–Data processing. | Hydrologic models.Classification: LCC GB656.2.H9 A36 2024 (print) | LCC GB656.2.H9 (ebook)  | DDC 551.480285/631–dc23/eng/20231023LC record available at https://lccn.loc.gov/2023039032LC ebook record available at https://lccn.loc.gov/2023039033

Cover design: WileyCover image: © Alexander Nikitin/Getty Images; fotograzia/Getty Images

LIST OF CONTRIBUTORS

 

Nicholas G. Aumen

Southeast Region

United States Geological Survey

Boynton Beach, Florida, USA

Biswa Bhattacharya

IHE Delft Institute for Water Education

Delft, The Netherlands

Rolando Célleri

Department of Water Resources and Environmental Sciences, and Faculty of Engineering

University of Cuenca

Cuenca, Ecuador

Gerald A. Corzo Perez

IHE Delft Institute for Water Education

Delft, The Netherlands

Vitali Diaz

IHE Delft Institute for Water Education

Delft, The Netherlands; and

Delft University of Technology

Delft, The Netherlands

Santiago Duarte

IHE Delft Institute for Water Education

Delft, The Netherlands; and

Delft University of Technology

Delft, The Netherlands

Andreas Efstratiadis

Department of Water Resources and Environmental Engineering

National Technical University of Athens

Zografou, Greece

Mostafa Farrag

Department of Water Quality and Ecology Software

Deltares, Delft, The Netherlands; and

Hydrology Section

GFZ German Research Centre for Geosciences

Potsdam, Germany; and

Institute for Environmental Sciences and Geography

University of Potsdam

Potsdam, Germany

Jan Feyen

Faculty of Bioscience Engineering

Catholic University of Leuven

Leuven, Belgium

Courtney S. I. Forde

Caribbean Institute for Meteorology and Hydrology

Saint Michael, Barbados

Haiting Gu

Institute of Hydrology and Water Resources

Zhejiang University

Hangzhou, China

Daniel R. Hitchcock

Department of Agricultural Sciences

Clemson University

Clemson, South Carolina, USA

Jiping Jiang

School of Environmental Science and Engineering

Southern University of Science and Technology

Shenzhen, China; and

State Key Laboratory of Urban Water Resource and Environment

School of Environment

Harbin Institute of Technology

Harbin, China

Andreja Jonoski

Department of Hydroinformatics and Socio‐Technical Innovation

IHE Delft Institute for Water Education

Delft, The Netherlands

Panagiotis Kossieris

Department of Water Resources and Environmental Engineering

National Technical University of Athens

Zografou, Greece

Li Liu

Institute of Hydrology and Water Resources

Zhejiang University

Hangzhou, China

Di Ma

Institute of Hydrology and Water Resources

Zhejiang University

Hangzhou, China

Paul Muñoz

Department of Water Resources and Environmental Sciences, and Faculty of Engineering

University of Cuenca

Cuenca, Ecuador

Suli Pan

Institute of Hydrology and Water Resources

Zhejiang University

Hangzhou, China

Tianrui Pang

State Key Laboratory of Urban Water Resource and Environment

School of Environment

Harbin Institute of Technology

Harbin, China

Fidel Perez

Mother and Teacher Pontifical Catholic University

Santo Domingo, Dominican Republic

Ioana Popescu

Department of Hydroinformatics and Socio‐Technical Innovation

IHE Delft Institute for Water Education

Delft, The Netherlands

Vidya S. Samadi

Department of Agricultural Sciences

Clemson University

Clemson, South Carolina, USA

Germán Santos

Colombian School of Engineering Julio Garavito

Bogotá, Colombia

Bellie Sivakumar

Department of Civil Engineering

Indian Institute of Technology Bombay

Mumbai, India

Dimitri P. Solomatine

IHE Delft Institute for Water Education

Delft, The Netherlands; and

Water Resources Section

Delft University of Technology

Delft, The Netherlands; and

Water Problems Institute of the Russian Academy of Sciences

Moscow, Russia

Eric D. Swain

Caribbean‐Florida Water Science Center

United States Geological Survey

Lutz, Florida, USA

Sadgeh Sadeghi Tabas

Department of Civil Engineering

Clemson University

Clemson, South Carolina, USA

Carlos Tami

Colombian School of Engineering Julio Garavito

Bogotá, Colombia

Sijie Tang

School of Environmental Science and Engineering

Southern University of Science and Technology

Shenzhen, China

Ioannis Tsoukalas

Department of Water Resources and Environmental Engineering

National Technical University of Athens

Zografou, Greece

Jose Valles

IHE Delft Institute for Water Education

Delft, The Netherlands

Henny A. J. Van Lanen

Hydrology and Quantitative Water Management Group

Wageningen University

Wageningen, The Netherlands

Daniel A. Vázquez

IHE Delft Institute for Water Education

Delft, The Netherlands

Han Wang

China Institute of Water Resources and Hydropower Research

Beijing, China

Catherine A. M. E. Wilson

Hydro‐environmental Research Centre

School of Engineering

Cardiff University

Cardiff, United Kingdom

Na Wu

College of Environmental Science and Engineering

Tongji University

Shanghai, China

Jingkai Xie

Institute of Hydrology and Water Resources

Zhejiang University

Hangzhou, China

Yue‐Ping Xu

Institute of Hydrology and Water Resources

Zhejiang University

Hangzhou, China

Yunqing Xuan

Zienkiewicz Centre for Computational Engineering

Swansea University

Swansea, United Kingdom

Yi Zheng

School of Environmental Science and Engineering

Southern University of Science and Technology

Shenzhen, China

Xingyu Zhou

Shanghai Investigate, Design & Research Institute Co., Ltd.

Shanghai, China

PREFACE

Hydroinformatics deals with advanced information technology, data analytics, modeling, artificial intelligence (AI), and optimization applied to problems of aquatic environments for the purpose of informing management. Many of these technologies have become standard tools that support water management decisions around the world. However, the technologies continue to develop and new ones are emerging, which allows them to be applied to more complex and interesting problems. There are multiple examples of environmental and hydrological problems being dealt with not only by employing physically based (process) models but also with advanced data analysis tools and machine learning models.

The rapid development of machine learning and AI brings new possibilities for hydroinformatics research and practice. Nowadays, complex issues are analyzed by identifying and explaining patterns and anomalies of measured or simulated data. With an increasing amount of data collected about the environment, physically based models are increasingly complemented (and sometimes even replaced) by data‐driven models. Although data‐driven models lack the ability of physically based models to explain the physics of underlying processes, they are able to discover the hidden patterns in data and often can be more accurate in forecasting. Thus, they play an important role. Pattern recognition has been one of the main tasks solved by machine learning, and lately has been given an additional push by the development and use of deep learning, an important class of machine learning algorithms and of AI in general.

Data analytics plays an important role in water resources when data are multidimensional, and spatial and time dimensions have to be dealt with in a coordinated fashion. In relation to water resources, both dimensions were always important, but recently the need to handle huge amounts of remote sensing data (big data) has become more pronounced. These developments have motivated new research efforts in the context of predicting hydrological extremes and called for testing novel approaches of spatiotemporal data analysis and machine learning. In many problems, machine learning applied to hydrological extremes is studied in connection with uncertainties.

An issue in water resources management is optimal planning and operation under uncertainties, and this is where the role of AI‐driven approaches is also becoming more important. A traditional optimization approach typically cannot help much since such optimization is model based and objective functions cannot be analytically expressed. Algorithms developed under the framework of computational intelligence have been the focus of hydroinformatics for three decades but the new problems and the increased data availability lead to the necessity of testing new approaches and their critical analysis.

This book presents research results and experiences of applying hydroinformatics and, in particular, artificial intelligence and optimization technologies for water‐related problems. It targets hydrologists, water resources engineers, modelers, forecasters, and hydroinformatics specialists interested in the latest experiences of applying machine learning and optimization techniques to various water resources problems.

The chapters in the book are grouped into three sections: modeling hydrological systems, forecasting water resources, and knowledge discovery and optimization.

Part I deals with advances in modeling hydrological systems concentrating on distributed representations. Chapters consider model identifiability by driving calibration with stochastic inputs, use of heterogeneous precipitation data sources, fuzzy committees of distributed models, and using machine learning to represent dynamics of rainfall runoff.

Part II covers various aspects of forecasting water resources, which is an area of great importance for planning and providing warnings in case of extreme events. The main challenge here is to ensure an extended horizon of forecasts considering the associated uncertainties. Adequate incorporation of physics in machine learning models via variable selection process is the key in ensuring interpretability and acceptance of the final results. Different types of data analysis methods and models are presented from wavelet decomposition and noise filter to water level and flow predictions using deep learning. Well‐established (and accurate) neural networks (multilayer perceptrons) are not forgotten, as demonstrated by an example of the operational hydrological forecasting system used in El Salvador.

Part III covers relatively new areas of knowledge discovery and optimization based on using machine learning techniques. The chapters demonstrate how new vast data sources (big data) ranging from modeling results to spatiotemporal data analytics to social media provide new opportunities for discovering unseen patterns important for deeper understanding of water‐related processes. One chapter demonstrates how information from news media about water extremes combined with sentiment analysis enables the discovery of unexpected knowledge patterns. Another chapter shows how drought analysis from a 3‐D perspective of clustering extreme events can be used to characterize how phenomena develop in space and time. A further contribution demonstrates how so‐called complex networks are helping in discovering teleconnection patterns in water quality dynamics. Water management decisions are becoming more and more challenging, thus ideal for developing and testing various model‐based multi‐objective optimization approaches under uncertainty, an issue also dealt with in this part of the book.

The research results and experiences presented here demonstrate how machine learning and, more generally, artificial intelligence, can advance data analytics, accuracy of modeling and forecasting, and knowledge discovery for better water management under uncertainty. We hope this book will also provide inspiration for further advancing research in hydroinformatics.

We would like to acknowledge the valuable contributions of all the authors as well as their dedication and patience in multiple interactions with reviewers, whom we would also like to thank. This book includes contributions from researchers from many countries around the world, who present a broad scope of interesting problems of different geographical and hydrometeorological nature and scale. We are also thankful to our home, IHE Delft Institute for Water Education in the Netherlands, where hydroinformatics began 35 years ago, and where a number of our contributors studied. Our gratitude also goes to staff at AGU and Wiley who have played an important role by continuously supporting the preparation of this book.

 

Gerald A. Corzo PerezIHE Delft Institute for Water EducationThe NetherlandsDimitri P. SolomatineIHE Delft Institute for Water Education, andDelft University of TechnologyThe Netherlands

1HYDROINFORMATICS AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING IN WATER‐RELATED PROBLEMS

Gerald A. Corzo Perez1 and Dimitri P. Solomatine1,2,3

1IHE Delft Institute for Water Education, Delft, The Netherlands

2Water Resources Section, Delft University of Technology, Delft, The Netherlands

3Water Problems Institute of the Russian Academy of Sciences, Moscow, Russia

In recent years, there has been a surge of interest in machine learning (ML) and artificial intelligence (AI) due to the effectiveness of deep learning algorithms and the increasing availability of large data sets. This chapter provides a brief overview of the applications of AI and ML techniques in hydroinformatics, a field that deals with advanced information technology, data analytics, and modeling for aquatic environment management. Data‐driven models are becoming more common in water management as they can reveal hidden patterns in data and offer improved accuracy in certain situations. This chapter highlights the importance of spatiotemporal data analysis, pattern recognition, and optimization approaches in water resources management under uncertainty. It does not offer a comprehensive review of all methods but rather focuses on selected ML techniques widely used in water‐related problems. Additionally, the chapter discusses the challenges associated with using ML models, such as black‐box criticisms, and the potential of hybrid models that combine the strengths of ML and physically based process models for more robust solutions in hydroinformatics.

1.1. Introduction

Hydroinformatics deals with advanced information technology, data analytics, modeling, artificial intelligence (AI), and optimization applied to problems of aquatic environment for the purpose of informing management. Many of these technologies have become standard tools that support water management decisions around the world. However, the technologies are developing further, new ones are emerging, and this allows for applying them to more complex and interesting problems. One can find multiple examples when environmental and hydrological problems have been dealt with not only by employing physically based (process) models, but also advanced data analysis tools and machine learning models have been used. Using AI techniques in geosciences has a long history. Hydroinformatics, formulated by Abbott (1991) 30 yr ago, has been defined as a union of computational hydraulics (CH) and AI (so that HI = CH ∪ AI), and during the last three decades we have been witnessing a much wider use of AI, with a large number of successful practical applications. The first stage of such development has been covered, for example, in the edited volume Practical hydroinformatics: Computational intelligence and technological developments in water applications (Abrahart et al., 2008), and in dozens of other books and hundreds of research papers covering these new developments.

Currently, we see a new wave of interest in machine learning (ML) and AI, which is partly explained by the demonstrable effectiveness of the new generation of deep learning algorithms and availability of large data sets (see, e.g., Nearing et al., 2021), and this brings new possibilities for hydroinformatics research and practice. With an increasing amount of data collected about the environment, physically based models are more and more complemented and sometimes even replaced by data‐driven models. Lacking the ability of physically based models to explain the physics of underlying processes, data‐driven models are however able to discover the hidden patterns in data and often can be more accurate, and play an important supporting role, in water management. Pattern recognition (e.g., automatic identification of flooded areas on satellite images) has been one of the main tasks solved by machine learning, and lately has been given an additional push by the development and use of deep learning, an important class of machine learning algorithms, and of AI in general. Data analytics plays an important role in water resources when data are multidimensional, and spatial and time dimensions have to be dealt with in a coordinated fashion. In relation to water resources, both dimensions were always important, but recently the need to handle huge amounts of remote sensing data (“big data”) has become more pronounced. These developments have motivated new research efforts in the context of predicting hydrological extremes and call for testing novel approaches of spatiotemporal data analysis and machine learning. Due to much easier access to supercomputing facilities, there are increased possibilities to study the models uncertainty (typically using Monte Carlo frameworks), and machine learning can also play a role in building predictive models of such uncertainties. An issue in water resources management is optimal planning and operation under uncertainties, and this is where the role of AI‐driven approaches is also becoming more important. Classical optimization approaches (gradient‐based nonlinear optimization) typically cannot help much, since such optimization is model based, and objective functions (and their gradients) cannot be analytically expressed. Optimization approaches developed under the framework of computational intelligence (various types of randomized search, e.g. evolutionary approaches) have been the focus of hydroinformatics for three decades, but the new problems and the increased data availability lead to the necessity of testing new approaches and their critical analysis.

This chapter aims at presenting a brief overview of AI‐ and ML‐related building processes and methods widely used for water‐related problems, in the context of the chapters presented in this volume. AI is a concept that covers a wide area of science and technology, however, quite often it is used interchangeably with ML, which is in fact a narrower notion. One may find in literature quite a large number of AI‐ and ML‐related subareas: big data, data mining, pattern recognition (PR), natural language processing (NLP), neural networks, deep learning, and so on. We will not go into a discussion about terminology and differences in AI and ML; for the purpose of this chapter and the issues covered in the book, it would be right to use a somewhat narrower term, that is, machine learning.

ML techniques have been widely used in water resources during the last decades, however, at the same time, one may observe also inadequate use of ML‐related modeling procedures, unjustified selection of algorithms, and even lack of understanding of why a model provides good or poor performance in mathematical and statistical sense. There is also well‐known criticism of ML and statistical techniques by practitioners who are used to employing physically based (process) models; they are pointing out that a water resources problem interpretation is hidden in the so‐called black box of a ML model. There is indeed a challenge of posing the problem in the right way: how domain knowledge can drive selection, building, and tuning a ML model. Lack of data and its uncertainty also makes it difficult for practitioners to feel confident about ML models.

On the other hand, the strength of ML is in its ability to represent the relationships between inputs and outputs, provided enough data are available. Although the relatively recent advances in deep learning have opened the door to the new ways of using spatiotemporal data, and at the same time motivating new algorithm developments from spatial patterns and, in general, all types of computer vision algorithms, not all problems can be tackled by ML. Input and output relations can be so complex that ML techniques may not be able to find the hidden patterns, and in such cases hybrid models, combining power of ML and process models (so‐called physics‐aware AI; see, e.g., Jiang et al., 2020) would be needed. Such hybrid approaches are given now increased attention in hydroinformatics.

This chapter is not intended to provide a comprehensive review of methods (which are covered in hundreds of books and in the referred literature herein), but rather focuses on some important elements of ML model building, and presents basics of several selected ML techniques quite widely used in solving water‐related problems, allowing for “feeling the flavor” of ML.

1.2. Key Principles of ML/Hydroinformatics

1.2.1. AI and ML Definitions

There is a large number of evolving definitions of AI, and this can be explained by its permanent evolution and shifts in priorities, and the advances in the used mathematical instruments. Many literature sources point out that for the first time the term AI was used in 1956 at the Dartmouth Conference, were John McCarthy, Alan Turing, and other founding fathers of AI, help to coin the term artificial intelligence. One of the definitions reads: “AI is the field devoted to building artificial animals (or at least artificial creatures that, in suitable contexts, appear to be animals) and, for many, artificial persons (or at least artificial creatures that, in suitable contexts, appear to be persons)” (Stanford Encyclopedia of Philosophy, 2018). On the other hand, Wikipedia defines it as the “intelligence demonstrated by machines, unlike the natural intelligence displayed by humans and animals” (Artificial Intelligence, 2022). Yet another definition (sometimes referred to as being given by IBM) states, that “AI leverages computers and machines to mimic the problem‐solving and decision‐making capabilities of the human mind.” All these definitions differ in details, but are very similar in the main idea: a machine (a programmed computer) is supposed to imitate some behavior of a living creature.

An old debate regarding whether humans will be replaced by machines has been reinitiated in various public media in the view of the latest developments in AI, especially generative AI, as implemented, for example in platforms like ChatGPT. Indeed, AI has evolved into different types, related to an extent to which it may take over some of humans' activities. The first ideas of what could be achieved are purely reactive, which is highly related to the beginnings of computer science, where, AI does not have any memory, which basically means no initial data base or information of processes. This concept can be applied to solving narrow specialized tasks. For example, a forecast is performed based only on the current situation, limited historical samples, and known variables. Further development can lead to building up memory, by collecting previous experience and more complex and voluminous data and continue adding it to the memory. Such AI systems have enough memory or experience to support humans in performing various tasks, but their ability is still limited and they are still seen as a helping hand. For example, it can provide adaptive forecasts depending on the context, such as previous performance, climatic conditions, type of a river basin, and others. An even higher level of AI can be explained as a theory of mind (Premack & Woodruff, 1978) where AI can understand thoughts and emotions and interact socially. This type of concept needs an integration of many components of AI, development of more sophisticated mathematical apparatus, so such developments are still at a rudimentary level. At the top level, it is possible to consider how these systems can become aware of life and even become self‐aware. This concept links to the idea that AI machines can create new knowledge and, at the same time, build internal system concepts that link intelligence, sentience, and consciousness.

Advances of AI have been numerous and applied in various areas. We should admit however, that in water resources, only a few of such developments have been used, and these relate to application of specific machine learning techniques.

Figure 1.1 presents a schematization of some of the key techniques of ML, with references to decades when these methods started to develop. Due to a wide application of ANN, this architecture is presented in more detail. One of the relatively new developments is natural language processing (NLP); it uses deep learning (DL) to train models that help interpret text and reinforcement learning concepts to use DL. Pattern recognition uses convolutions and DL, which develop pattern recognition to extract features. Finally, metaheuristics provide the basis for new models of DL. The following are some of the concepts used in this chapter:

Figure 1.1 Evolution of AI topics: From artificial intelligence to machine learning and deep learning in hydroinformatics.

ML (machine learning). Mathematical models that aim to represent groups and/or input‐output relationships from data

NLP (natural language processing). The use of language elements, in general, text encoded into numbers and its analysis, mainly from the transformation of text and processing it to solve, replicate semantics, and understand them

Pattern recognition. ML can be characterized as a subarea that explores how data and their attributes (variables or features) can be detected. Many ML algorithms do implicitly detect patterns and therefore these areas are interrelated. Computer vision is an important area of their application. It is worth noting that a number of important pattern recognition mathematical apparatus and algorithms are not explicitly positioned in the machine learning realm, for example, procedures of denoising and filtering, segmentation of images, 3D virtual reality patterns, vector fields flow, but they for sure contribute to solving the pattern recognition problems.

1.2.2. Machine Learning (ML)

There are various ways of contextualizing ML. From the perspective of computer science, the concept of ML can be seen as aiming at changing the programming paradigm (Fig. 1.2). Aim here is to develop a computer program that will not require significant analysis to understand how to create an algorithm to obtain certain responses; instead, a ML algorithm, theoretically, can learn from inputs and responses (outputs).

In many applications, however, ML is not seen as a tool to generate computer programs, but instead is expected to help in building input‐output models by learning from data, in other words, data‐driven models (see Fig. 1.6). Their use is quite varied and most of the time is justified by the idea that a system might be very complex and we may not observe all the internal states of a modeled system or process (e.g., in hydrological modeling this may be soil moisture). This implies that if we have a complex system, with only a limited understanding of the driving variables of a natural process (or any process in general), and we can measure the consequences of events (i.e., outputs resulting from particular inputs), then with this information, it is possible to generate ML models (Fig. 1.3).

Figure 1.2 Differences between traditional programming and ML, as seen in computer science: (a) Computer science algorithm development; (b) Machine Learning algorithm development.

In most cases, the ML engines (e.g., artificial neural networks) work with numerical (real‐valued) data, so are, in fact, nonlinear regression models. If data are nonnumerical (e.g., classes, images, or words), they have to be first transformed (encoded) into numerical form, and then processed.

In relation to Earth sciences, wide adoption of ML has not been fast, to say the least, since many scientists were pointing out that there is no clear justification for using these algorithms. Their reasoning was that the models, as descriptors of reality, should be based on scientific understanding of processes (e.g., physics), and not on a statistical encapsulation of data sets. Water resources are not an exception in this sense, and early applications of ML have been criticized, as they end up reproducing natural problems that do not need to be reproduced abstractly with ML, which was arguably resulting often in building a blind representation of a well‐known problem. However, during the last two to three decades, there have been many examples of successful applications of ML reported and implemented in decision support systems. It has been shown that ML methods are often more accurate than the traditional hydrologic models in forecasting (see, e.g., Nearing et al., 2021; Arsenault et al., 2023). ML also helps to replace complex slow‐running physically based models: a ML model is trained on data generated by a process model, and such fast metamodel (surrogate) would replace a much slower process model in operational systems and therefore be used in real‐time forecasting to provide warnings in an efficient manner. ML‐based pattern recognition algorithms can also help to automate the detection of critical scenarios, combining variables that might not be easily related physically and capturing nontrivial relationships and patterns implicitly present in data, reproducing thus complex phenomena. Therefore, ML has become a powerful analytical and predictive tool.

Figure 1.3 Encapsulation of natural systems and processes in ML models, with feedbacks.

Aside from ML, there are other areas in AI worth attention, that is, natural language processing and metaheuristics, and they are also considered due to their potential for water resources management.

1.2.3. Natural Language Processing (NLP)

The Internet has allowed us to arrange access to billions of documents, images, and audio and video material in very different areas of human activities. It would be interesting to understand if and how we can use these data for solving water resources problems. Data on the Internet are often not structured and linked to a variety of sources, from websites of organizations, to social media, news, blogs, videos, and more. In many cases useful data are presented as text. The idea of text mining is not new; however, the large amount of data available on the Internet, in the form of text format, has generated a boom in developing intelligent tools, referred to as natural language processing (NLP). NLP can be defined as the ability of a computer program to understand human language as it is spoken and written, that is natural language (Sun et al., 2022). It starts with the idea of processing text and develops ways to interpret and reproduce it. The ways of understanding how we write have been formalized in tools for sentiment analysis of text, generation of text, correction of text, text extraction, and concept of artificial assistants.

NLP converts letters, words, and phrases into numerical representation. This is done sometimes in simple terms, like numbering each word in a phrase and repeating the number when the word repeats itself. However, the results of this numerical representation need to follow the basics of the language (Khurana et al., 2023). Therefore, typically, the process of interpreting the language focuses on five steps:

Lexical (morphological) analysis: In essence, it is breaking text into paragraphs, phrases, and words. Furthermore, it is possible to understand at the level of individual words, the morphemes as the smallest units of a word. Last, lexical analysis identifies the morphemes and allows us to characterize the word and understand its meaning knowing its root form. The final objective of this step is to help to identify words, which are normally referred to as tokens, since the original word in fact possesses some information, and, for programming, it is a sequence of characters, which represent a unit of information.

Syntax analysis: it allows for checking the grammar and with this the way words are arranged in a sentence. As a consequence, this order allows us to find how words should be normally arranged. Using this information, it is possible to build relationships between them. Knowing this, it is possible to assess the parts of a sentence (POS) and tag this information based on the structure found.

Semantic analysis: This step aims at finding the meaning of the statement, how the phrase reads literally. This understanding provides the basis for rejecting syntactically valid, but illogical statements.

Discourse integration: The context in which a phrase is used can be very important, so this step aims at establishing links between the different sentences, especially the immediately preceding one.

Pragmatic analysis, this concept uses a set of rules that describe cooperative dialogs, as in social content. What can be found in social media and common interactions can become a rule and with this we can comprehend the way the communication takes place.