Deep Learning for the Earth Sciences -  - E-Book

Deep Learning for the Earth Sciences E-Book

0,0
117,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

DEEP LEARNING FOR THE EARTH SCIENCES Explore this insightful treatment of deep learning in the field of earth sciences, from four leading voices Deep learning is a fundamental technique in modern Artificial Intelligence and is being applied to disciplines across the scientific spectrum; earth science is no exception. Yet, the link between deep learning and Earth sciences has only recently entered academic curricula and thus has not yet proliferated. Deep Learning for the Earth Sciences delivers a unique perspective and treatment of the concepts, skills, and practices necessary to quickly become familiar with the application of deep learning techniques to the Earth sciences. The book prepares readers to be ready to use the technologies and principles described in their own research. The distinguished editors have also included resources that explain and provide new ideas and recommendations for new research especially useful to those involved in advanced research education or those seeking PhD thesis orientations. Readers will also benefit from the inclusion of: * An introduction to deep learning for classification purposes, including advances in image segmentation and encoding priors, anomaly detection and target detection, and domain adaptation * An exploration of learning representations and unsupervised deep learning, including deep learning image fusion, image retrieval, and matching and co-registration * Practical discussions of regression, fitting, parameter retrieval, forecasting and interpolation * An examination of physics-aware deep learning models, including emulation of complex codes and model parametrizations Perfect for PhD students and researchers in the fields of geosciences, image processing, remote sensing, electrical engineering and computer science, and machine learning, Deep Learning for the Earth Sciences will also earn a place in the libraries of machine learning and pattern recognition researchers, engineers, and scientists.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 863

Veröffentlichungsjahr: 2021

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

Title Page

Copyright

Dedication

Foreword

Acknowledgments

List of Contributors

List of Acronyms

1 Introduction

1.1 A Taxonomy of Deep Learning Approaches

1.2 Deep Learning in Remote Sensing

1.3 Deep Learning in Geosciences and Climate

1.4 Book Structure and Roadmap

Part I: Deep Learning to Extract Information from Remote Sensing Images

2 Learning Unsupervised Feature Representations of Remote Sensing Data with Sparse Convolutional Networks

2.1 Introduction

2.2 Sparse Unsupervised Convolutional Networks

2.3 Applications

2.4 Conclusions

3 Generative Adversarial Networks in the Geosciences

3.1 Introduction

3.2 Generative Adversarial Networks

3.3 GANs in Remote Sensing and Geosciences

3.4 Applications of GANs in Earth Observation

3.5 Conclusions and Perspectives

Note

4 Deep Self‐taught Learning in Remote Sensing

4.1 Introduction

4.2 Sparse Representation

4.3 Deep Self‐taught Learning

4.4 Conclusion

5 Deep Learning‐based Semantic Segmentation in Remote Sensing

5.1 Introduction

5.2 Literature Review

5.3 Basics on Deep Semantic Segmentation: Computer Vision Models

5.4 Selected Examples

5.5 Concluding Remarks

6 Object Detection in Remote Sensing

6.1 Introduction

6.2 Preliminaries on Object Detection with Deep Models

6.3 Object Detection in Optical RS Images

6.4 Object Detection in SAR Images

6.5 Conclusion

Notes

7 Deep Domain Adaptation in Earth Observation

7.1 Introduction

7.2 Families of Methodologies

7.3 Selected Examples

7.4 Concluding Remarks

Notes

8 Recurrent Neural Networks and the Temporal Component

8.1 Recurrent Neural Networks

8.2 Gated Variants of RNNs

8.3 Representative Capabilities of Recurrent Networks

8.4 Application in Earth Sciences

8.5 Conclusion

Note

9 Deep Learning for Image Matching and Co‐registration

9.1 Introduction

9.2 Literature Review

9.3 Image Registration with Deep Learning

9.4 Conclusion and Future Research

10 Multisource Remote Sensing Image Fusion

10.1 Introduction

10.2 Pansharpening

10.3 Multiband Image Fusion

10.4 Conclusion and Outlook

Notes

11 Deep Learning for Image Search and Retrieval in Large Remote Sensing Archives

11.1 Introduction

11.2 Deep Learning for RS CBIR

11.3 Scalable RS CBIR Based on Deep Hashing

11.4 Discussion and Conclusion

11.4 Acknowledgement

Part II: Making a Difference in the Geosciences With Deep Learning

12 Deep Learning for Detecting Extreme Weather Patterns

12.1 Scientific Motivation

12.2 Tropical Cyclone and Atmospheric River Classification

12.3 Detection of Fronts

12.4 Semi‐supervised Classification and Localization of Extreme Events

12.5 Detecting Atmospheric Rivers and Tropical Cyclones Through Segmentation Methods

12.6 Challenges and Implications for the Future

12.7 Conclusions

13 Spatio‐temporal Autoencoders in Weather and Climate Research

13.1 Introduction

13.2 Autoencoders

13.3 Applications

13.4 Conclusions and Outlook

Note

14 Deep Learning to Improve Weather Predictions

14.1 Numerical Weather Prediction

14.2 How Will Machine Learning Enhance Weather Predictions?

14.3 Machine Learning Across the Workflow of Weather Prediction

14.4 Challenges for the Application of ML in Weather Forecasts

14.5 The Way Forward

Notes

15 Deep Learning and the Weather Forecasting Problem: Precipitation Nowcasting

15.1 Introduction

15.2 Formulation

15.3 Learning Strategies

15.4 Models

15.5 Benchmark

15.6 Discussion

Appendix

Acknowledgement

Note

16 Deep Learning for High‐dimensional Parameter Retrieval

16.1 Introduction

16.2 Deep Learning Parameter Retrieval Literature

16.3 The Challenge of High‐dimensional Problems

16.4 Applications and Examples

16.5 Conclusion

17 A Review of Deep Learning for Cryospheric Studies

17.1 Introduction

17.2 Deep‐learning‐based Remote Sensing Studies of the Cryosphere

17.3 Deep‐learning‐based Modeling of the Cryosphere

17.4 Summary and Prospect

Appendix: List of Data and Codes

18 Emulating Ecological Memory with Recurrent Neural Networks

18.1 Ecological Memory Effects: Concepts and Relevance

18.2 Data‐driven Approaches for Ecological Memory Effects

18.3 Case Study: Emulating a Physical Model Using Recurrent Neural Networks

18.4 Results and Discussion

18.5 Conclusions

Part III: Linking Physics and Deep Learning Models

19 Applications of Deep Learning in Hydrology

19.1 Introduction

19.2 Deep Learning Applications in Hydrology

19.3 Current Limitations and Outlook

Acknowledgments

20 Deep Learning of Unresolved Turbulent Ocean Processes in Climate Models

20.1 Introduction

20.2 The Parameterization Problem

20.3 Deep Learning Parameterizations of Subgrid Ocean Processes

20.4 Physics‐aware Deep Learning

20.5 Further Challenges ahead for Deep Learning Parameterizations

21 Deep Learning for the Parametrization of Subgrid Processes in Climate Models

21.1 Introduction

21.2 Deep Neural Networks for Moist Convection (Deep Clouds) Parametrization

21.3 Physical Constraints and Generalization

21.4 Future Challenges

22 Using Deep Learning to Correct Theoretically‐derived Models

22.1 Experiments with the Lorenz '96 System

22.2 Discussion and Outlook

22.3 Conclusion

23 Outlook

Bibliography

Index

End User License Agreement

List of Tables

Chapter 4

Table 4.1 Class‐wise accuracies [%], overall accuracy [%], average accuracy [...

Chapter 5

Table 5.2 Leave‐one‐winter‐out results (left, over all three lakes) and Leave...

Table 5.3 Key figures of the St. Moritz webcam data

Table 5.4 Lake ice segmentation results for webcams

Chapter 6

Table 6.1 The rules of samples classification.

Table 6.2 Comparisons with different first vertex definition methods on the m...

Table 6.3 Comparison with different methods on the gap of mAP between HBB and...

Table 6.4 Comparison between deformable RoI pooling and RoI Transformer. The ...

Table 6.5 Comparison with other methods.

Chapter 7

Table 7.1 Common label classes between the UC Merced and WHU‐RS19 datasets.

Table 7.2 Overall accuracies for the discussed datasets and domain adaptation...

Table 7.3 F1 scores for the target city.

Chapter 9

Table 9.1 Grouping of image matching techniques depending on the type of imag...

Table 9.2 Errors measured as average Euclidean distances between estimated la...

Chapter 10

Table 10.1 Performance comparison of three non‐DL and four DL methods at redu...

Table 10.2 Processing time comparison of three non‐DL and four DL methods in ...

Table 10.3 The size of the image used for HSR experiments.

Table 10.4 Quantitative comparison of different algorithms on two different i...

Table 10.5 Processing time compassion of five non‐DL and two DL methods in th...

Chapter 11

Table 11.1 Main characteristics of the DL‐based CBIR systems in RS.

Table 11.2 Main characteristics of the state‐of‐the‐art deep hashing‐based CB...

Table 11.3 Comparison of the DL loss functions considered within the deep has...

Chapter 12

Table 12.1 Data sources used for TC and AR binary classification.

Table 12.2 Dimension of image, diagnostic variables (channels) and labeled da...

Table 12.3 Classification CNN architecture and layer parameters. The convolut...

Table 12.4 Accuracy of deep learning for TC and AR binary classification task...

Table 12.5 Confusion matrix for tropical cyclone classification.

Table 12.6 Confusion matrix for atmospheric river classification.

Table 12.7 Per‐category counts and IOU front detection metrics for 2013–2016.

Table 12.8 Front category confusion matrices for 2013–2016.

Table 12.9 Class frequency breakdown for Tropical Cyclones (

TC

), Extra‐Tropic...

Table 12.10 Semi‐Supervised Accuracy Results: Mean AP for the models. Table r...

Table 12.11 AP for each class. Frequency of each class in the test set shown ...

Chapter 13

Table 13.1 This table summarizes all discussed variations of the standard AE....

Table 13.2 Summary of results in Tibau et al. (2018). The column

Reconstructio

...

Table 13.3 Summary of results in Klampanos et al. (2018). Accuracy of the dif...

Chapter 15

Table 15.1 Rain rate statistics in the HKO‐7 benchmark.

Table 15.2 Summary of reviewed methods. The first half are FNN‐based models a...

Chapter 16

Table 16.1 Summary of CNN model used in the retrieval of atmospheric temperat...

Table 16.2 Summary of CNN model.

is the window size in the average pooling o...

Chapter 18

Table 18.1 Datasets used in MATSIRO model simulation.

Table 18.2 Factorial experimental design: the four models are trained individ...

Table 18.3 Summary of the scope of the experiments.

Table 18.4 The model and training parameters from hyper‐parameter optimizatio...

List of Illustrations

Chapter 2

Figure 2.1 Scheme of the proposed method for unsupervised and sparse learnin...

Figure 2.2 Kappa statistic (classification accuracy estimated) for several n...

Figure 2.3 For the outputs of the different layers 1st to 7th, in columns, m...

Figure 2.4 Top: for RGB (a), LiDAR (b) and RGB+LiDAR (c), learned bases by t...

Chapter 3

Figure 3.1 Generative adversarial network scheme. It shows the flow for the ...

Figure 3.2 Conditional generative adversarial network scheme. It shows the f...

Figure 3.3 Cycle‐consistent generative adversarial network scheme. It shows ...

Figure 3.4 Close in time upscaled (333m resolution) Landsat‐8 and Proba‐V im...

Figure 3.5 Example of architecture for Domain Adaptation between two satelli...

Figure 3.6 An example architecture of a convolutional generative adversarial...

Chapter 4

Figure 4.1 Schematic illustration of different learning paradigms and their ...

Figure 4.2 Schematic illustration of the deep self‐taught learning framework...

Figure 4.3 Example images from UC Merced dataset for the classes

agricultu

...

Chapter 5

Figure 5.1 Comparison of pipelines for (a) image classification versus (b) s...

Figure 5.2 Example of architecture with a hard‐coded upsampling, in which ev...

Figure 5.3 Semantic segmentation architectures learning the upsampling.

Figure 5.4 (Adapted from (Marcos et al. 2018a)) Diagram of the first RotConv...

Figure 5.5 Examples of classification maps obtained in the Vaihingen validat...

Figure 5.6 SnapNet processing: (1) The point‐cloud is meshed to enable the (...

Figure 5.7 SnapNet results on the Semantic3D dataset (Hackel et al. 2017): c...

Figure 5.8 The four Sentinel‐1 orbits (

,

,

,

) that scan

Region Sils

(sh...

Figure 5.9 Example results for St. Moritz on a non‐frozen day (row

), Silva...

Figure 5.10 Segmentation results (cross‐camera setting).

Chapter 6

Figure 6.1 Examples of remote sensing images containing objects of interest.

Figure 6.2 Challenges of object detection in remote sensing. (a) Arbitrary o...

Figure 6.3 IoU calculation between two oriented bounding boxes.

Figure 6.4 Examples of Precision‐Recall Curve. As the recall increases,...

Figure 6.5 Architectures of Faster R‐CNN and R‐FCN.

Figure 6.6 (a–b) Borderline states of regression‐based OBB representations. ...

Figure 6.7 Samples for illustrating mask‐oriented bounding box representatio...

Figure 6.8 Overview of the pipeline for detecting oriented objects by Mask O...

Figure 6.9 Horizontal RoI vs. Rotated RoI.

Figure 6.10 Network architecture of RoI Transformer.

Figure 6.11 Relative offsets.

Figure 6.12 The flow chart of CFAR algorithm.

Chapter 7

Figure 7.1 Domain adaptation loss (red) imposed on a CNN's feature vectors p...

Figure 7.2 Examples from the UC Merced (top) and WHU‐RS19 (bottom) datasets....

Figure 7.3 Confusion matrix of the source only model (top left) and differen...

Figure 7.4 Source, target, and fake source images. Best viewed in color.

Figure 7.5 Real data and the standardized images by the

Gray‐world

alg...

Figure 7.6 Classification maps on the target city (

Villach

) by the U‐net fin...

Figure 7.7 Limitations of hist. matching, CycleGAN, and ColorMapGAN.

Figure 7.8 Example drone images from the source (left) and target (right) do...

Figure 7.9 Feature space projections using t‐SNE (van der Maaten and Hinton ...

Figure 7.10 Precision‐recall curves for the CNN on source (left) and target ...

Figure 7.11 Cumulative number of animals found over the course of the ten AL...

Chapter 8

Figure 8.1 Applying deep feed‐forward neural networks to multi‐temporal data...

Figure 8.2 Schematic illustration of a single RNN cell that updates the cell...

Figure 8.3 When a recursive, feed‐back neural network is unrolled through ti...

Figure 8.4 The computational graph of an unrolled RNN with forward (black ar...

Figure 8.5 In a

long short‐term memory (LSTM)

 (Hochreiter and Schmidhu...

Figure 8.6 As the capacity of a single RNN cell is limited, several RNN cell...

Figure 8.7 Bi‐directional RNNs (Schuster and Paliwal, 1997) and LSTM network...

Figure 8.8 Two recurrent network models – i.e. (a) a vanilla recurrent neura...

Figure 8.9 The vegetation activity of two field parcels – cultivated with me...

Figure 8.10 Recurrent models can outperform feed‐forward baselines in crop c...

Chapter 9

Figure 9.1 A schematic diagram of the image matching and image registration ...

Figure 9.2 A schematic diagram of two different architectures presented in t...

Figure 9.3 Qualitative evaluation for three different pairs of images. From ...

Figure 9.4 Qualitative evaluation for the different methods ( (Vakalopoulou ...

Chapter 10

Figure 10.1 Training samples generation workflow (top) and iterative network...

Figure 10.2 Pansharpening results with different compared methods at a reduc...

Figure 10.3 Pansharpening results with different compared methods at a full‐...

Figure 10.4 Example of HS and MS data fusion: (a) supervised approaches and ...

Figure 10.5 HSR results of different methods with Chikusei image. We choose ...

Chapter 11

Figure 11.1 General block scheme of a RS CBIR system.

Figure 11.2 Different strategies considered within the DL‐based RS CBIR syst...

Figure 11.3 The intuition behind the triplet loss function: after training, ...

Chapter 12

Figure 12.1 Contrasting traditional heuristics‐based event detection versus ...

Figure 12.2 Top: architecture for tropical cyclone classification. Right: ar...

Figure 12.3 4‐layer front detection CNN architecture with 64

filters per l...

Figure 12.4 Coded Surface Bulletin fronts and CNN‐generated front likelihood...

Figure 12.5 Mean annual frontal frequencies for Coded Surface Bulletin and C...

Figure 12.6 Diagram of the 3D semi‐supervised convolutional network architec...

Figure 12.7 Bounding box predictions shown on 2 consecutive (6 hours in betw...

Figure 12.8 Feature maps for the 16 channels for one of the frames in the da...

Figure 12.9 Schematic of the modified DeepLabv3+ network used in this work. ...

Figure 12.10 Top: Segmentation masks overlaid on a globe. Colors (white‐yell...

Chapter 13

Figure 13.1 The general architecture of a spatial AE. The left‐most layer co...

Figure 13.2 The architecture of a variational autoencoder. The main differen...

Figure 13.3 Summary of the use of an AE for weather and climate. These can b...

Figure 13.4 An example of the results in Tibau et al. (2018). Plot of (a)

,...

Figure 13.5 Schematic view of the architecture used in Lusch et al. (2018)....

Figure 13.6 Schematic view of the architecture used by Li and Misra (2017). ...

Figure 13.7 Climate data is typically represented on a grid at different lev...

Chapter 14

Figure 14.1 Processes that influence weather and climate. The figure is repr...

Figure 14.2 Workflow of weather prediction from observations to forecast dis...

Figure 14.3 Score‐card for ensemble simulations at ECMWF (reproduced from Du...

Figure 14.4 A visualization of the way forward. It will require a concerted ...

Chapter 15

Figure 15.1 (a) The overall structure of the U‐NET in Agrawal et al. (2019)....

Figure 15.2 The dynamic convolutional layer. The input is fed into two sub‐n...

Figure 15.3 Inner structure of ConvLSTM. Source: (Xingjian et al. 2015).

Figure 15.4 Connection structure of the star‐shaped bridge.

Figure 15.5 Connection structure of PredRNN. The orange arrows in PredRNN de...

Figure 15.6 ST‐LSTM block (top) and Memory In Memory block (bottom). For bre...

Figure 15.7 The non‐stationary module (MIM‐N) and the stationary module (MIM...

Figure 15.8 Encoder‐forecaster architecture adopted in Shi et al. (2017). So...

Figure 15.9 Top: For convolutional RNN, the recurrent connections are fixed ...

Chapter 16

Figure 16.1 The plots are model forecasting parameters at the surface, extra...

Figure 16.2 Three ways of modelling spatial, spectral and temporal relations...

Figure 16.3 Input: Decomposed IASI spectrum using the MNF (260 components). ...

Figure 16.4 Transect profile of RMSE, Linear Regression (OLS), and CNN on cu...

Figure 16.5 Polygon ice chart overlay on the HH polarization of a Sentinel‐1...

Figure 16.6 Conceptual flow of the prediction of sea ice maps with a CNN app...

Figure 16.7 Results of Fusion‐CNN. Left: ice chart from DMI experts, Mid: pr...

Figure 16.8 (a): BCE loss. (b): MSE loss

Chapter 17

Figure 17.1 Deep‐learning‐based studies of the cryosphere.

Figure 17.2 Jakobshavn Isbræ in western Greenland. Left: aerial photo (obliq...

Figure 17.3 Deep‐learning‐based delineation of thermokarst landforms. This e...

Chapter 18

Figure 18.1 Schematic diagram illustrating the temporal forest dynamics duri...

Figure 18.2 Global distributions of performances of different model setups b...

Figure 18.3 Difference maps of Nash‐Sutcliffe model efficiency coefficient (...

Figure 18.4 Box and whisker plots showing grid‐level model performances acro...

Figure 18.5 Seasonal cycle (first row), seasonal variation of the residuals ...

Chapter 19

Figure 19.1 A summary of recent progress on deep learning applications in hy...

Figure 19.2 Performance of the LSTM forecast model for the CAMELS data, in c...

Chapter 20

Figure 20.1 Schematic of physics‐aware deep learning parameterizations for i...

Figure 20.2 Evaluation in an idealized model, zonal velocity (time‐mean, lef...

Figure 20.3 Illustrative example considering the effects of averaging proced...

Figure 20.4 Interpretability: Activation maps are the result of the convolut...

Chapter 21

Figure 21.1 Effective Climate Sensitivity. Assessed range of effective clima...

Figure 21.2 Schematic representation of clouds in current climate models and...

Figure 21.3 Schematic diagram for ML‐based cloud parametrizations for climat...

Figure 21.4 Snapshot comparison of the CRM and NN convective responses. Snap...

Figure 21.5 Comparison of the thermodynamic profiles predicted by the CRM an...

Figure 21.6 Architecture‐constrained NNs can enforce conservation laws to wi...

Chapter 22

Figure 22.1 RMSEs of single‐time step tendency predictions by coarse‐scale m...

Figure 22.2 The ACC and RMSE of ensemble forecasts of the Truth validation r...

Figure 22.3 The bias in the climate mean and Kolmogorov–Smirnov (KS) statist...

Figure 22.4 Simulation quality diagnostics plotted against each other: (a) a...

Chapter 23

Figure 23.1 Future challenges of deep learning in linking to observations, e...

Guide

Cover Page

Title Page

Copyright

Dedication

Foreword

Acknowledgments

List of Contributors

List of Acronyms

Table of Contents

Begin Reading

Index

WILEY END USER LICENSE AGREEMENT

Pages

iii

iv

v

xvi

xvii

xviii

xix

xx

xxi

xxii

xxiii

xxiv

xxv

xxvi

1

2

3

4

5

6

7

8

9

10

11

13

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

283

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

403

404

405

406

407

Deep Learning for the Earth Sciences

A Comprehensive Approach to Remote Sensing, Climate Science, and Geosciences

 

Edited by

 

Gustau Camps‐Valls

Universitat de València, Spain

 

Devis Tuia

EPFL, Switzerland

 

Xiao Xiang Zhu

German Aerospace Center and Technical University of Munich, Germany

 

Markus Reichstein

Max Planck Institute, Germany

 

 

 

 

 

 

This edition first published 2021

© 2021 John Wiley & Sons Ltd

Chapter 14 © 2021 John Wiley & Sons Ltd. The contributions to the chapter written by Samantha Adams © Crown copyright 2021, Met Office. Reproduced with the permission of the Controller of Her Majesty's Stationery Office. All Other Rights Reserved.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at ;http://www.wiley.com/go/permissions.

The right of Gustau Camps‐Valls, Devis Tuia, Xiao Xiang Zhu, and Markus Reichstein to be identified as the authors of the editorial material in this work has been asserted in accordance with law.

Registered Offices

John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA

John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK

Editorial Office

The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK

For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.

Wiley also publishes its books in a variety of electronic formats and by print‐on‐demand. Some content that appears in standard print versions of this book may not be available in other formats.

Limit of Liability/Disclaimer of Warranty

While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

Library of Congress Cataloging‐in‐Publication Data

Name: Camps‐Valls, Gustau, editor.

Title: Deep learning for the earth sciences : a comprehensive approach to

 remote sensing, climate science and geosciences / edited by Gustau

 Camps‐Valls [and three others].

Description: Hoboken, NJ : Wiley, 2021. | Includes bibliographical

 references and index.

Identifiers: LCCN 2021012965 (print) | LCCN 2021012966 (ebook) | ISBN

 9781119646143 (cloth) | ISBN 9781119646150 (adobe pdf) | ISBN

 9781119646167 (epub)

Subjects: LCSH: Earth sciences–Study and teaching. | Algorithms–Study and

 teaching.

Classification: LCC QE26.3 .D44 2021 (print) | LCC QE26.3 (ebook) | DDC

 550.71–dc23

LC record available at https://lccn.loc.gov/2021012965

LC ebook record available at https://lccn.loc.gov/2021012966

Cover Design: Wiley

Cover Image: © iStock.com/monsitj, Emilia Szymanek/Getty Images

 

 

 

To Adrian Albert, in memoriam

Foreword

Earth science, like many other scientific disciplines, is undergoing a data revolution. In particular, a massive amount of data about Earth and its environment is now continuously being generated by Earth observing satellites as well as physics‐based earth system models running on large‐scale computational platforms. These information‐rich datasets offer huge potential for understanding how the Earth's climate and ecosystem have been changing, and for addressing societal grand challenges relating to food/water/energy security and climate change.

Deep learning, which has already revolutionized many disciplines (e.g., computer vision, natural language processing) holds tremendous promise to revolutionize earth and environmental sciences. In fact, recent years have seen an exponential growth in the use of deep learning in Earth Science, with many amazing results. Deep learning also faces challenges that are unique to earth science data: multimodality; high degree of heterogeneity in space and time; and the fact that earth science data can only provide an incomplete and noisy view of the underlying eco‐geo‐physical processes that are interacting and unfolding at different spatial and temporal scales. Addressing these challenges requires development of entirely new approaches that can effectively incorporate existing earth science knowledge inside the deep learning learning framework. Success in addressing these challenges stands to revolutionize deep learning itself and accelerate discovery across many other scientific domains.

The book does a fantastic job of capturing the state of the art in this fast evolving area. It is logically organized in to 3 coherent parts, each containing chapters written by experts in the field. Each chapter provides an easily to understand introductory material followed by an in‐depth treatment of the applications of deep learning to specific earth science applications as well as ideas for future research. This book is a must read for the students and researchers alike who would like to harness the data revolution in earth sciences to address pressing societal challenges.

Vipin KumarRegents Professor, Department of Computer Science & Engineering, University of Minnesota, USA

Acknowledgments

We would like to acknowledge the help of all involved in the collation and review process of the book, without whose support the project could not have been satisfactorily completed. A further special note of thanks goes also to all the staff at Wiley, whose contributions throughout the whole process, from inception of the initial idea to final publication, have been valuable. Special thanks also go to the publishing team at Wiley, who continuously prodded via e‐mail, keeping the project on schedule.

We wish to thank all of the authors for their insights and excellent contributions to this book. Most of the authors of chapters included in this book also served as referees for chapters written by other authors. Thanks go to all those who provided constructive and comprehensive reviews.

This book was possible without any dedicated funding, but editors' and authors' research was partially supported by research projects that made it possible. We want to thank all agencies and organizations for supporting our research in general, and this book indirectly. Gustau Camps‐Valls acknowledges support by the European Research Council (ERC) under the ERC‐CoG‐2014 project 647423.

Thanks all!

Gustau Camps‐Valls, Devis Tuia, Xiao Xiang Zhu, Markus Reichstein

València+Sion+Munich+Jena, August, 2021

List of Contributors

Adriana Romero

School of Computer Science, McGill University, Canada

Ankur Mahesh

Lawrence Berkeley National Lab

UC Berkeley

USA

Basil Kraft

Max Planck Institute for Biogeochemistry

Jena & Technical University of Munich,

Germany

Begüm Demir

Faculty of Electrical Engineering and Computer Science

Technische Universität Berlin

Germany

Benjamin Kellenberger

Wageningen University and Research

The Netherlands

Bertrand Le Saux

ESA / ESRIN Φ‐lab

Italy

Bharath Bhushan Damodaran

IRISA‐OBELIX Team

France

Burlen Loring

Lawrence Berkeley National Lab

UC Berkeley

USA

Carlo Gatta

Vintra Inc.

Barcelona

Spain

Chaopeng Shen

Civil and Environmental Engineering

Pennsylvania State University

University Park

USA

Christian Reimers

German Aerospace Center (DLR) & Friedrich‐Schiller‐Universität

Jena

Germany

Christian Requena‐Mesa

German Aerospace Center (DLR) & Max‐Planck Institute for Biogeochemistry & Friedrich‐Schiller‐Universität

Jena

Germany

Christopher Beckham

MILA / Polytechnique Montreal

Montreal, Canada

Christopher Pal

Polytechnique Montreal

Canada

Danfeng Hong

German Aerospace Center

Germany

David Malmgren‐Hansen

Department of Applied Mathematics and Computer Science

Technical University of Denmark

Kgs. Lyngby

Denmark

Devis Tuia

EPFL

Switzerland

Diego Marcos

Wageningen University and Research

The Netherlands

Dit‐Yan Yeung

CSE Department

HKUST

Hong Kong

Evan Racah

Lawrence Berkeley National Lab

UC Berkeley

USA

Gencer Sumbul

Faculty of Electrical Engineering and Computer Science

Technische Universität Berlin

Germany

Giuseppe Scarpa

University of Naples Federico II

Italy

Gonzalo Mateo‐García

Image Processing Laboratory

Universitat de València

Spain

Gui‐Song Xia

State Key Lab. LIESMARS

and School of Computer Science

Wuhan University

China

Gustau Camps‐Valls

Image Processing Laboratory

Universitat de València

Spain

Hao Wang

Department of Computer Science

Rutgers University

USA

Jakob Runge

German Aerospace Center (DLR)

Jena

Germany

Javier García‐Haro

Environmental Remote Sensing group (UV‐ERS)

Universitat de València

Spain

Jian Ding

State Key Lab. LIESMARS

Wuhan University

China

Jian Kang

Faculty of Electrical Engineering and Computer Science

Technische Universität Berlin

Germany

Jim Biard

Lawrence Berkeley National Lab

UC Berkeley

USA

Jinwang Wang

School of Electronic Information

Wuhan University

China

Jose E. Adsuara

Image Processing Laboratory

Universitat de València

Spain

Karthik Kashinath

Lawrence Berkeley National Lab

UC Berkeley

USA

Kathryn Lawson

Civil and Environmental Engineering

Pennsylvania State University

USA

Kenneth E. Kunkel

North Carolina State University (NCSU)

US

Konrad Schindler

ETH Zurich

Switzerland

Laure Zanna

New York University

USA

Lin Liu

Earth System Science Programme

Faculty of Science

The Chinese University of Hong Kong

Hong Kong SAR

China

Luis Gómez‐Chova

Image Processing Laboratory

Universitat de València

Spain

Manuel Campos‐Taberner

Environmental Remote Sensing Group (UV‐ERS)

Universitat de València

Spain

Marc Russwurm

Technical University of Munich

Germany

Marco Körner

Technical University of Munich

Germany

Maria Vakalopoulou

CentraleSupelec

University Paris Saclay

Inria Saclay

France

Markus Reichstein

Max‐Planck Institute for Biogeochemistry

Jena

Germany

Mayur Mudigonda

Lawrence Berkeley National Lab

UC Berkeley

USA

Michael F. Wehner

Lawrence Berkeley National Lab

UC Berkeley

USA

Mihir Sahasrabudhe

CentraleSupelec

Universite Paris Saclay

Inria Saclay

France

Naoto Yokoya

The University of Tokyo and RIKEN Center for Advanced Intelligence Project

Japan

Nicolas Courty

Université de Bretagne Sud

Laboratoire IRISA

France

Nikos Paragios

CentraleSupelec

Universite Paris Saclay

Inria Saclay

France

Onur Tasar

Inria Sophia Antipolis

France

Peter A. G. Watson

School of Geographical Sciences

University of Bristol

UK

Peter Bauer

European Centre for Medium Range Weather Forecasts (ECMWF)

Reading

UK

Peter D. Dueben

European Centre for Medium Range Weather Forecasts (ECMWF)

Reading

UK

Pierre Gentine

Columbia University

USA

Prabhat Ram

Lawrence Berkeley National Lab

UC Berkeley

USA

Ribana Roscher

Institute of Geodesy and Geoinformation

University of Bonn

Germany

Samantha Adams

Met Office Informatics Lab

Exeter

UK

Samira Kahou

École de technologie supérieure

Montreal

Quebec

Canada

Simon Besnard

Max Planck Institute for Biogeochemistry

Jena Germany

Laboratory of Geo‐Information Science and Remote Sensing

Wageningen University & Research

The Netherlands

Sookyung Kim

Lawrence Berkeley National Lab

UC Berkeley

USA

Stergios Christodoulidis

Institut Gustave Roussy

Paris

France

Sujan Koirala

Max Planck Institute for Biogeochemistry

Jena

Germany

Tatsumi Uezato

RIKEN Center for Advanced Intelligence Project

Japan

Tegan Maharaj

Montreal Polytechnique & Mila, Montreal, Canada

Thomas Bolton

University of Oxford

UK

Thorsten Kurth

Lawrence Berkeley National Lab

UC Berkeley

USA

Tom Beucler

Columbia University & University of California

Irvine

USA

Travis O'Brien

Lawrence Berkeley National Lab

UC Berkeley

USA

Valero Laparra

Image Processing Laboratory

Universitat de València

Spain

Veronika Eyring

German Aerospace Center (DLR) and University of Bremen

Germany

Wai‐Kin Wong

Hong Kong Observatory

Wang‐chun Woo

Hong Kong Observatory

Wei He

RIKEN Center for Advanced Intelligence Project

Japan

Wen Yang

School of Electronic Information

Wuhan University

China

William D. Collins

Lawrence Berkeley National Lab

UC Berkeley

USA

Xavier‐Andoni Tibau

German Aerospace Center (DLR)

Jena

Germany

Xiao Xiang Zhu

Technical University of Munich and German Aerospace Center (DLR)

Munich

Germany

Xingjian Shi

Amazon

USA

Yunjie Liu

(Formerly) Lawrence Berkeley National Lab

UC Berkeley

US

Zhihan Gao

Hong Kong University of Science and Technology

Hong Kong

List of Acronyms

AE

Autoencoder

AI

Artificial Intelligence

AIC

Akaike's Information Criterion

AP

Access Point

AR

Autoregressive

ARMA

Autoregressive and Moving Average

ARX

Autoregressive eXogenous

AWGN

Additive white Gaussian noise

BCE

Binary Cross‐Entropy

BER

Bit Error Rate

BP

Back‐propagation

BPTT

Back‐propagation through Time

BRT

Bootstrap Resampling Techniques

BSS

Blind Source Separation

CAE

Contractive Autoencoder

CBIR

Content‐based Image Retrieval

CCA

Canonical Correlation Analysis

CCE

Categorical Cross‐Entropy

CGAN

Conditional Generative Adversarial Network

CNN

Convolutional Neural Network

CONUS

Conterminous United States

CPC

Contrastive Predicting Coding

CSVM

Complex Support Vector Machine

CV

Cross Validation

CWT

Continuous Wavelet Transform

DAE

Denoising Autoencoder

DCT

Discrete Cosine Transform

DFT

Discrete Fourier Transform

DL

Deep Learning

DNN

Deep Neural Network

DSM

Dual Signal Model

DSP

Digital Signal Processing

DSTL

Deep Self‐taught Learning

DWT

Discrete Wavelet transform

ELBO

Evidence Lower Bound

EM

Expectation–Maximization

EO

Earth Observation

EPLS

Enforcing Population and Lifetime Sparsity

ERM

Empirical Risk Minimization

ET

Evapotranspiration

EUMETSAT

European Organisation for the Exploitation of Meteorological Satellites

FC

Fully Connected

FFT

Fast Fourier Transform

FIR

Finite Impulse Response

FT

Fourier Transform

GAE

Generalized Autoencoder

GAN

Generative Adversarial Network

GCM

General Circulation Model

GM

Gaussian Mixture

GP

Gaussian Process

GPR

Gaussian Process Regression

GRNN

Generalized Regression Neural Network

GRU

Gated Recurrent Unit

HMM

Hidden Markov Model

HP

Hyper‐parameter

HRCN

High Reliability Communications Networks

HSIC

Hilbert‐Schmidt Independence Criterion

i.i.d.

Independent and Identically Distributed

IASI

Infrared Atmospheric Sounding Interferometer

ICA

Independent Component Analysis

IIR

Infinite Impulse Response

KF

Kalman Filter

KKT

Karush–Kuhn–Tucker

KM

Kernel Method

KPCA

Kernel Principal Component Analysis

KRR

Kernel Ridge Regression

LAI

Leaf Area Index

LASSO

Least Absolute Shrinkage and Selection Operator

LCC

Leaf‐Chlorophyll‐Content

LE

Laplacian eigenmaps

LiDAR

Light Detection and Ranging of Laser Imaging Detection and Ranging

LLE

Locally Linear Embedding

LMS

Least Mean Squares

LS

Least Squares

LSTM

Long‐Short‐Term‐Memory

LTSA

Local Tangent Space Alignment

LUT

Look‐up Tables

MAE

Mean Absolute Error

MDN

Mixture Density Network

ME

Mean Error

MGU

Minimal Gated Unit (MGU)

ML

Maximum Likelihood

MLP

Multilayer Perceptron

MNF

Minimum Noise Fractions

MSE

Mean Square Error

NDVI

Normalized‐Vegetation‐Difference‐Index

NMR

Nuclear Magnetic Resonance

NN

Neural Networks

NOAA

National Oceanic and Atmospheric Administration

NSE

Nash‐Sutcliffe model efficiency coefficient

NWP

Numerical Weather Prediction

OAA

One Against All

OAO

One Against One

OLS

Ordinary Least Square

OMP‐k

Orthogonal Matching Pursuit

PAML

Physics‐aware Machine Learning

PCA

Principal Component Analysis

PINN

Physics‐informed Neural Network

PSD

Predictive Sparse Decomposition

RAE

Relational Autoencoder

RBF

Radial Basis Function

RBM

Restricted Boltzmann Machine

RKHS

Reproducing Kernel in Hilbert Space

RMSE

Root Mean Square Error

RNN

Recurrent Neural Network

ROC

Receiver Operating Characteristic

RS

Remote Sensing

RTRL

Real‐Time Recurrent Learning

SAE

Sparse Auto‐Encoders

SAE

Sparse Autoencoder

SAR

Synthetic Aperture Radar

SC

Sparse Coding

SNR

Signal‐to‐Noise Ratio

SRM

Structural Risk Minimization

SSL

Semi‐Supervised Learning

STL

Self‐taught Learning

SV

Support Vector

SVAE

Sparse Variational Autoencoder

SVM

Support Vector Machine

tBPTT

truncated Back‐propagation through Time

VAE

Variational Autoencoder

XAI

Explainable Artificial Intelligence

1Introduction

Gustau Camps‐Valls, Xiao Xiang Zhu, Devis Tuia and Markus Reichstein

Machine learning methods are widely used to extract patterns and insights from the ever‐increasing data streams from sensory systems. Recently, deep learning, a particular type of machine learning algorithm (Goodfellow et al. 2016), has excelled in tackling data science problems, mainly in the fields of computer vision, natural language processing, and speech recognition. Since a few years ago, it has become impossible to ignore deep learning. Started as a curiosity in the 1990s, deep learning has imposed itself as the prime machine learning paradigm in the last ten years, especially thanks to the availability of large datasets and of the advances in hardware and parallelization allowing them to be learned from. Nowadays, most machine learning research is somehow deep learning‐based and new heights in performance have been reached in virtually all fields of data science, both applied and theoretical. Adding to this the community efforts in sharing code and the availability of computational resources, deep learning seems to be the winner to unlock data science research.

In recent years, deep learning has shown increased evidence of the potential to address problems in Earth and climate sciences as well (Reichstein et al. 2019). As for many applied fields of science, Earth observation and climate science are more and more strongly data‐driven. Deep learning strategies are currently explored by more and more researchers and neural networks are used in many operational systems. The advances in the field are impressive, but there is still much ground to cover to understand the complex systems that are our Earth and its climate. Why deep learning is working in Earth data problems is also a challenging question, for which one could argue a statistical reason. As in computer vision or language processing, Earth Sciences also consider spatial and temporal data that exhibit high autocorrelation functions which deep learning methods treat very well. But what is the physical reason, if any? Is deep learning discovering guiding or first principles in the data automatically? Why do convolutions in space or time lead to appropriate feature representations? Are those representations sparse, physically consistent, or even causal? Explaining what the deep learning model actually learned is a challenge itself. Even though AI has promised to change the way we often do science, with DL the first step in this endeavor, this will not be the case unless we resolve these questions.

The field of deep learning for Earth and climate sciences is so wide and fast‐evolving that we could not cover all different methodological approaches and geoscientific problems. A selected representative subset of methods, problems, and promising approaches were selected for the book. With this introduction (and more in general with this book), we want to take a picture of the state of the art of the efforts in machine learning (section 1.1), in the remote sensing (section 1.2) and geosciences and climate (section 1.3) communities to integrate, use, and improve deep learning methods. We also want to provide resources for researchers that want to start including neural networks‐based solutions in their data problems.

1.1 A Taxonomy of Deep Learning Approaches

Given the current pace of the advances in deep learning, providing a taxonomy of approaches is not an easy task. The field is full of creativity and new inventive approaches can be found on a regular basis. Without the pretension of being exhaustive, most deep learning approaches can be placed along the lines of the following dimensions:

Supervised vs. unsupervised

. This is probably the most traditional distinction in machine learning and also applies in deep learning methodologies. Basically, it boils down to knowing whether the method uses labeled information to train or not. The best known examples of supervised deep methods are the Convolutional Neural Network (CNN, Fukushima (1980); LeCun et al. (

1998a

); Krizhevsky (

1992

)) and the recurrent neural network (RNN, Hochreiter and Schmidhuber (1997)), both using labels to evaluate the loss function and backpropagate errors to update weights, the former for image data and the latter for data sequences. As for unsupervised methods, they do not use ground truth information and therefore rely on unsupervised criteria to train. Among unsupervised methods, autoencoders (Kramer

1991

; Hinton and Zemel

1994

) are the most well known. They use the error in reconstructing the original image to train and are often used to learn low‐dimensional representations (Hinton and Salakhutdinov 2006a) or for denoising images (Vincent and Larochelle

2010

).

In between these two endpoints, one can find a number of approaches tuning the level and the nature of supervision: weakly supervised models (Zhou 2018), for instance, use image‐level supervision to predict phenomena at a finer resolution (e.g. localize objects by only knowing whether they are present in the image), while self‐supervised models use the content of the image itself as a supervisory signal; proceeding this way, the labels to train the model come for free. For example, self‐supervised tasks include predicting the color values from a greyscale version of the image (Zhang et al. 2016c), predicting relative position of patches to learn part to object relations (Doersch et al. 2015), or predicting the rotation that has been applied to an image (Gidaris et al. 2018).

Generative vs. discriminative

. Most methods described above are discriminative, in the sense that they minimize an error function comparing the prediction with the true output (a label or the image itself when reconstructing). They model the conditional probability of the target

given an observation

, i.e.,

. A generative model generates possible inputs that respect the joint input/outputs distribution. In other words it models the conditional probability of the data

given an output

, i.e.

. Generative models can therefore sample instances (e.g. patches, objects, images) from a distribution, rather than only choosing the most likely one, which is a great advantage when data are complex and show multimodalities. For instance, when generating images of birds, they could generate different instances of birds of the same species with subtle shape or color differences. Examples of generative deep models are the variational autoencoders (VAE, Kingma and Welling (

2014a

); Doersch (2016)) and the generative adversarial networks (GAN, Goodfellow et al. (

2014a

)), where a generative model is trained to generate images that are so realistic that a model trained to recognize real from fake ones fails.

Forward vs. recurrent

. The third dimension concerns the functioning of the network. Most models described above are forward models, meaning that the information flows once from the input to prediction before errors are backpropagated. However, when dealing with data structured as sequences (e.g. temporal data) one could make information flow across the sequence dimension. Recurrent models (RNNs, firstly introduced in Williams et al. (

1986

)) exploit this structure to inform the next step in the sequence of the hidden representations learned by the previous. Backpropagating information along the sequence also has its drawbacks, especially in terms of vanishing gradients, i.e. gradients that, after few recursion steps become zero and do not update the model any more: to cope with this, network including skip connections called memory gates have been proposed: the Long‐Short Memory Network (LSTM, Hochreiter and Schmidhuber (1997)) and the Gated Recurrent Unit (GRU, Cho et al. (2014)) are the most known.

1.2 Deep Learning in Remote Sensing

Taking off in 2014, deep learning in remote sensing has become a blooming research field, almost a hype. To give an example, to date there are more than 1000 published papers related to the topic (Zhu et al. 2017; Ma et al. 2019b). Such massive and dynamic developments are triggered by, on the one hand, methodological advancements in deep learning and the open science culture in the machine learning and computer vision communities which resulted in open access to codes, benchmark datasets, and even pre‐trained models. On the other hand, it is due to the fact that Earth observation (EO) has become an operational source of open big data. Fostered by the European Copernicus program with its high‐performance satellite fleet and open access policy, the user community has increased and widened considerably during the last years. This raises high expectations for valuable thematic products and intelligent knowledge retrieval. In the private sector, NewSpace companies launch(ed) hundreds of small satellites which have become a complementary and affordable source of EO data. This requires new data‐intensive – or even data‐driven – analysis methods from data science and artificial intelligence, among others – deep learning.

To summarize the development in the past six years, deep learning in remote sensing has been through three main phases with temporal overlapping: exploration, benchmarking, and EO‐driven methodological developments. In the following, we overview these three phases. Given the huge number of existing literature, it is unavoidable to give just a selection of examples subject to bias.

Phase 1: Exploration (2014 to date)

: The exploration phase is characterized by quick wins, often achieved by the transfer and tailoring of network architectures from other fields, most notably from computer vision. To name a few early examples, stacked autoencoders are applied to extract high‐level features from hyperspectral data for classification purposes in Chen et al. (

2014

). Bentes et al. have exploited deep neural networks for the detection and classification of objects, such as ships and windparks, in oceanographic SAR images (Bentes et al.

2015

). In 2015, Marmanis et al. (

2016

) have fine‐tuned ImageNet pre‐trained networks to boost the performance of land use classification with aerial images. Since then, researchers explore the power of deep learning for a wide range of classic tasks and applications in remote sensing, such as classification, detection, semantic segmentation, instance segmentation, 3D reconstruction, data fusion, and many more.

Whether using pre‐trained models or training models from scratch, it is always about addressing new and intrinsic characteristics of remote sensing data (Zhu et al. 2017):

Remote sensing data are often

multi‐modal

. Tailored architectures must be developed for, e.g. optical (multi‐ and hyperspectral) (Audebert et al.

2019

) and synthetic aperture radar (SAR) data (Chen et al.

2016

; Zhang et al.

2017

; Marmanis et al.

2017

; Shahzad et al.

2019

), where both the imaging geometries and the content are completely different. Data and information fusion use these complementary data sources in a synergistic way (Schmitt and Zhu

2016

). Already prior to a joint information extraction, a crucial step is to develop novel architectures for the matching of images taken from different perspectives and even different imaging modality, preferably without requiring an existing 3D model (Marcos et al.

2016

; Merkle et al.

2017b

; Hughes et al. 2018). Also, besides conventional decision fusion, an alternative is to investigate transfer learning from deep features of different imaging modalities (Xie et al.

2016

).

Remote sensing data are

geo‐located

, i.e., each pixel in a remote sensing imagery corresponds to a geospatial coordinate. This facilitates the fusion of pixel information with other sources of data, such as GIS layers (Chen and Zipf 2017; Vargas et al.

2019

; Zhang et al.

2019b

b), streetview images (Lefèvre et al.

2017

; Srivastava et al.

2019

; Kang et al.

2018

; Hoffmann et al.

2019a

), geo‐tagged images from social media (Hoffmann et al.

2019b

; Huang et al.

2018c

), or simply other sensors as above.

Remote sensing

time series

data is becoming standard, enabled by Landsat, ESA's Copernicus program, and the blooming NewSpace industry. This capability is triggering a shift from individual image analysis to time‐series processing. Novel network architectures must be developed for optimally exploiting the temporal information jointly with the spatial and spectral information of these data. For example, convolutional recurrent neural networks are becoming baselines in multitemporal remote sensing data analysis applied to change detection (Mou et al.

2018

), crop monitoring (Rußwurm and Körner

2018b

; Wolanin et al.

2020

), as well as land use and land cover classification (Qiu et al.

2019

). An important research direction is unsupervised or weakly supervised learning for change detection (Saha et al.

2019b

) or anomaly detection (Munir et al.

2018

) from time series data.

Remote sensing has irreversibly entered the

big data era

. We are dealing with very large and ever‐growing data volumes, and often on a global scale. On the one hand this allows large‐scale or even global applications, such as monitoring global urbanization (Qiu et al.

2020

), large‐scale mapping of land use/cover (Li et al.

2016a

), large‐scale cloud detection (Mateo‐García et al.

2018

) or cloud removal (Grohnfeldt et al.

2018

), and retrieval of global greenhouse gas concentrations (Buchwitz et al.

2017

) and a multitude of trace gases resolved in space, time, and vertical domains (Malmgren‐Hansen et al.

2019

). On the other hand, algorithms must be fast enough and sufficiently transferable to be applied for the whole Earth surface/atmosphere, which in turn calls for large and representative training datasets, which is the main topic of phase 2.

In addition, it is important to mention that – unlike in computer vision – classification and detection are only small fractions of remote sensing and Earth observation problems. Actually, most of the problems are related to the retrieval of bio‐geo‐physical or bio‐chemical variables. This will be discussed in section 1.3.

Phase 2: Benchmarking (2016 to date)

: To train deep learning methods with good generalization abilities and to compare different deep learning models, large‐scale benchmark datasets are of great importance. In the computer vision community, there are many high‐quality datasets available which are dedicated to, for example, image classification, semantic segmentation, object detection, and pose estimation tasks. To give an example, the well‐known ImageNet image classification database consists of more than 14 million hand‐annotated images cataloged into more than 20,000 categories (Deng et al.

2009a

). It is debatable whether the computer vision community is too much driven by the benchmark culture, instead of caring about real‐world challenges. In remote sensing it is, however, the other extreme – we are lacking sufficient training data. For example, most classic methodological developments in hyperspectral remote sensing have been based on only a few benchmark images of limited sizes, let alone the annotation demanding deep learning methods. To push deep learning related research in remote sensing, community efforts in generating large‐scale real‐world scenario benchmarks are due. Motivated by this, since 2016 an increasing number of large‐scale remote sensing datasets have become available covering a variety of problems, such as instance segmentation (Chiu et al.

2020

; Weir et al.

2019

; Gupta et al.

2019

), object detection (Xia et al.

2018

; Lam et al. 2018), semantic segmentation (Azimi et al.

2019

; Schmitt et al.

2019

; Mohajerani and Saeedi

2020

), (multi‐label) scene classification (Sumbul et al.

2019

; Zhu et al.

2020

), and data fusion (Demir et al. 2018; Le Saux et al.

2019

). To name a few examples:

DOTA (Xia et al.

2018

): This is a large‐scale dataset for object detection in aerial images, which collect 2806 aerial images from different sensors and platforms containing objects exhibiting a wide variety of scales, orientations, and shapes. In total, it contains 188,282 object instances in 15 common object categories and serves as a very important benchmark for development of advanced object detection algorithms in very high resolution remote sensing.

So2Sat LCZ42 (Zhu et al.

2020

): This is a benchmark dataset for global local climate zones classification. It is a rigorously labeled reference dataset in EO. Over one month 15 domain experts carefully designed the labeling workflow, the error mitigation strategy, the validation methods, and conducted the data labeling. It consists of manually assigned local climate zone labels of 400,673 Sentinel‐1 and Sentinel‐2 image patch pairs globally distributed in 42 urban agglomerations covering all the inhabited continents and 10 cultural zones. In particular, it is the first EO dataset that provides a quantitative measure of the label uncertainty, achieved by letting a group of domain experts cast 10 independent votes on 19 cities in the dataset.

An exhaustive list of remote sensing benchmark datasets is summarized by Rieke et al. (2020). There is no doubt that these high‐quality benchmarks are essential for the next phase – EO‐driven methodological research.

Phase 3: EO‐driven methodological research (2019 to date)

: Going beyond these successful yet still EO‐driven but application‐oriented researches mentioned in phase 1, EO‐driven fundamental yet rarely addressed methodological challenges are attracting attention in the remote sensing community.

Reasoning: the capability to link meaningful transformation of entities over space or time is a fundamental property of intelligent species and also the way people understand visual data. Recently, in computer vision several efforts have been made to enable such capability of deep networks. For instance, Santoro et al. (

2017

) proposed a relational reasoning network for the problem of visual question answering. This network achieves a so‐called super‐human performance. Zhou et al. (

2018

) presented a temporal relation network to enable multiscale temporal relational reasoning in networks for video classification tasks. Reasoning is particularly relevant for Earth observation, as every measurement in remote sensing data is associated with a spatial‐temporal coordinate and characterized by spatial and temporal contextual relations, in particular when it comes to geo‐physical processes. As to reasoning networks in remote sensing, a first attempt can be found in Mou et al. (

2019

), where the authors propose reasoning modules in a fully convolutional network for semantic segmentation in aerial scenes. Further extending the relational reasoning to semantics, Hua et al. (

2020

) proposed an attention‐aware label relational reasoning network for multilabel aerial image classification. Another pioneering work of reasoning in remote sensing is visual question answering, the so‐called let remote sensing imagery speaks for itself (Lobry et al.

2019

). More remote sensing tasks benefiting from reasoning networks are yet open for discovery.

Uncertainty: EO applications target at retrieving physical or bio‐chemical variables in a large scale. These predicted physical quantities are often used in data assimilation and in decision making, for example in support of and for monitoring of the UN Sustainable Development Goals (SDGs). Therefore, besides high accuracy, traceability, and reproducibility of results, quantifying the uncertainty of these predictions from a deep learning algorithm is indispensable towards a quality and reliable Artificial Intelligence in Earth observation. Although quantifying uncertainty of parameter estimates in EO is common practice in traditional model‐driven approaches, this has not caught up with the rapid development of deep learning, where the model can also be learned. Only a handful of literature addressed it in the past (Zhu et al. 2017). But the EO community is realizing its indispensability for a responsible AI. For example, the “Towards a European AI4EO R&I Agenda” (ESA, 2018) mentioned uncertainty estimation as one of the future challenges of AI4EO. To give one encouraging example, one active research direction in uncertainty quantification focuses on using Bayesian neural networks (BNNs), which are a type of network which not only gives point estimates of model parameters and output predictions, but also provides the whole distribution over these values. For example, Kendall and Gal (

2017

) proposed a BNN that uses a technique called Learned Loss Attenuation to learn the noise distribution in input data, which can be used to find uncertainty in the final output. More recent studies, Ilg et al. (

2018

); Kohl et al. (

2018

) proposed BNNs that output a number of plausible hypotheses enabling creation of distribution over outputs and measuring uncertainties. Actually, Bayesian deep learning (BDL) offers a probabilistic interpretation of deep learning models by inferring distributions over the models' weights (Wang and Yeung

2016

; Kendall and Gal

2017

). These models, however, have not been applied extensively in the Earth Sciences, where, given the relevance of uncertainty propagation and quantification, they could find wide adoption. Only some pilot applications of deep Gaussian processes (Svendsen et al.

2018

) for parameter retrieval and BNNs for time series data analysis (Rußwurm et al.

2020

) are worth mentioning. In summary, the Bayesian deep learning community has developed model‐agnostic and easy‐to‐implement methodology to estimate both data and model uncertainty within deep learning models, which has great potential when applied to remote sensing problems (Rußwurm et al.

2020

).

Other open issues that recently caught the attention in the remote sensing community include but are not limited to: hybrid models integrating physics‐based modeling into deep neural networks, efficient deep nets, unsupervised and weakly supervised learning, network architecture search, and robustness in deep nets.

1.3 Deep Learning in Geosciences and Climate

A vast number of algorithms and network architectures have been developed and applied in the geosciences too. Here, the great majority of applications have to do with estimation of key biogeophysical parameters of interest or forecasting essential climate variables (ECV). The (ab)use of the standard multilayer perceptron in many studies has given rise to the use of more powerful techniques like convolutional networks that can exploit spatial fields of view while providing vertical estimations of parameters of interest in the atmosphere (Malmgren‐Hansen et al. 2019) and recurrent neural nets, and the long short‐term memory (LSTM) unit in particular, which has demonstrated good potential to deal with time series of biogeophysical parameters estimation, forecasting, and memory characterization of processes (Besnard et al. 2019b).

While deep learning approaches have classically been divided into spatial learning (for example, convolutional neural networks for object detection and classification) and sequence learning (for example, forecasting and prediction), there is a growing interest in blending these two perspectives. After all, Earth data can be cast as spatial structures evolving through time: weather forecasting or hurricane tracking are clear examples, but also is the case of the solid Earth (Bergen et al. 2019). We often face time‐evolving multi‐dimensional structures, such as organized precipitating convection which dominates patterns of tropical rainfall, vegetation states that influence the flow of carbon, and volcanic ash particles whose shape describe different physical eruption mechanisms, just to name a few (Reichstein et al. 2019; Bergen et al. 2019). Studies are starting to apply combined convolutional‐recurrent deep networks for precipitation nowcasting (Xingjian et al. 2015