126,99 €
Machine learning has led to incredible achievements in many different fields of science and technology. These varied methods of machine learning all offer powerful new tools to scientists and engineers and open new paths in geomechanics.
The two volumes of Machine Learning in Geomechanics aim to demystify machine learning. They present the main methods and provide examples of its applications in mechanics and geomechanics. Most of the chapters provide a pedagogical introduction to the most important methods of machine learning and uncover the fundamental notions underlying them.
Building from the simplest to the most sophisticated methods of machine learning, the books give several hands-on examples of coding to assist readers in understanding both the methods and their potential and identifying possible pitfalls.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 299
Veröffentlichungsjahr: 2024
Cover
Table of Contents
Title Page
Copyright Page
Preface
1 Overview of Machine Learning in Geomechanics
1.1. What exactly is machine learning?
1.2. Classification of ML methods
1.3. ML and geomechanics
1.4. Libraries for ML
1.5. Bias in ML and limitations
1.6. What to expect from these volumes?
1.7. Acknowledgments
1.8. References
2 Introduction to Regression Methods
2.1. Introduction
2.2. Linear regression
2.3. Gradient descent
2.4. Data preprocessing and model validation
2.5. Nonlinear regression
2.6. Regularization techniques
2.7. Challenges in generalization and extrapolation
2.8. Bayesian regression
2.9. Conclusions
2.10. References
3 Unsupervised Learning: Basic Concepts and Application to Particle Dynamics
3.1. Introduction
3.2. Basic concepts
3.3. Unsupervised learning techniques
3.4. Application to particle dynamics
3.5. Conclusion
3.6. Acknowledgements
3.7. References
4 Classification Techniques in Machine Learning
4.1. Introduction
4.2. Classification techniques
4.3. AL in classification
4.4. Conclusion
4.5. Acknowledgments
4.6. References
5 Artificial Neural Networks: Learning the Optimum Statistical Model from Data
5.1. Why PyTorch?
5.2. Introduction to sampling theory
5.3. Optimizing a neural network
5.4. References
List of Authors
Index
Summary of Volume 2
End User License Agreement
Chapter 1
Table 1.1.
Classification of some important methods in ML. The asterisk deno
...
Chapter 1
Figure 1.1.
Eighteen times Go world champion Lee Sedoll playing with the mac
...
Figure 1.2.
Classification of ML methods. Method 1 is a supervised, batch, i
...
Figure 1.3.
Instance-based versus model-based ML for predicting failure (sli
...
Figure 1.4.
Number of published articles using ML for geotechnical applicati
...
Figure 1.5.
Normalized Google searches over time for TensorFlow and PyTorch
...
Figure 1.6.
Data points of a synthetic heartbeat, interpolated by two polyno
...
Chapter 2
Figure 2.1.
Drained monotonic triaxial compression tests of Karlsruhe fine s
...
Figure 2.2.
Linear regression: an example and a failure (!). Randomly genera
...
Figure 2.3.
Comparison of the linear regression model predictions with the l
...
Figure 2.4.
Schematic representation of gradient descent; the model paramete
...
Figure 2.5.
Gradient descent for a general non-convex loss function (readapt
...
Figure 2.6.
Batch gradient descent with various learning rates with the same
...
Figure 2.7.
Evolution of the model’s parameters during batch gradient descen
...
Figure 2.8.
Stochastic gradient descent and evolution of the parameters, wit
...
Figure 2.9.
Stochastic gradient descent with learning schedule
Figure 2.10.
Mini-batch gradient descent
Figure 2.11.
Gradient descent paths comparison in the parameters space
Figure 2.12.
Nonlinear regression using polynomial basis functions with degr
...
Figure 2.13.
Training and validation MSE at varying training set sizes for a
...
Figure 2.14.
Training (left), validation (center) and test (right) mean squa
...
Figure 2.15.
Overdetermined (left) and underdetermined (right) system
Figure 2.16.
Solution of an overdetermined linear system of equations obtain
...
Figure 2.17.
Nonlinear regression using polynomials of degree 100. Compariso
...
Figure 2.18.
Regression of the motion of a projectile with measurement noise
...
Figure 2.19.
Values of the parameters
θ
obtained using standard (pinv
...
Figure 2.20.
Errors of the cross-validated regression models: pinv, LASSO an
...
Figure 2.21.
Comparison between the projectile motion equation and the predi
...
Figure 2.22.
Evolution of the prior and posterior distributions of the param
...
Figure 2.23.
Gaussian process regression (with noise) for data generated fro
...
Chapter 3
Figure 3.1.
Typical machine learning flowchart that learns a predictive mode
...
Figure 3.2.
Schematic representation of a K-means clustering. Initial unlabe
...
Figure 3.3.
Principal component analysis of simple bivariate random Gaussian
...
Figure 3.4.
Snapshot of a 1 million atom MD simulation of zirconium during n
...
Figure 3.5.
Persistence of a local structure consisting of a central atom an
...
Figure 3.6.
TDA-GMM clustering model of undercooled liquid Zirconium during
...
Figure 3.7.
Analysis of a configuration of undercooled Zr during nucleation
...
Chapter 4
Figure 4.1.
Classification can be subjective. Image generated by the Author
...
Figure 4.2.
Schematic representation of a typical workflow for classificatio
...
Figure 4.3.
Classification task on a Gaussian bivariate distribution of
n
= ...
Figure 4.4.
Classification tasks using the logistic regression classifier. (
...
Figure 4.5.
Classification tasks using the support vector machines classifie
...
Figure 4.6.
Active learning for classification on a Gaussian bivariate distr
...
Chapter 5
Figure 5.1.
Schematic view of the learning framework of a statistical model
...
Figure 5.2.
Bernoulli distribution
Figure 5.3.
Iterative minimization (with Adamw algorithm) of
. The converge
...
Figure 5.4.
Iterative minimization (with AdamW algorithm) of
. The converge
...
Figure 5.5.
Geometrical representation of the conditional
NLL
Y ∣ X
...
Figure 5.6.
Different learning rate scheduling strategies. Different decay s
...
Figure 5.7.
defined in equation [5.108]. (a)
in three dimensions. (b) Tw
...
Figure 5.8.
SGD minimization trajectory for
θ
(0 − 1)
...
Figure 5.9.
Adadelta minimization trajectory for
θ
(0 − 1)
...
Figure 5.10.
Adam minimization trajectory for
θ
(0 − 1)
...
Figure 5.11.
SGD minimization trajectory for
γ
= 0
and
τ
= 0.5...
Figure 5.12.
SGD minimization trajectory for
θ
(0 − 1)
...
Figure 5.13.
RMSprop minimization trajectory for
γ
= 0.1
and
α
Figure 5.14.
RMSprop minimization trajectory for
θ
(0 − 1)
...
Figure 5.15.
Minimization trajectories for Adam (a) and AdamW (with weight d
...
Cover
Table of Contents
Title Page
Copyright Page
Preface
Begin Reading
List of Authors
Index
Summary of Volume 2
WILEY END USER LICENSE AGREEMENT
iii
iv
ix
x
xi
xii
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
239
240
241
243
244
245
246
SCIENCES
Mechanics, Field Director – Gilles Pijaudier-Cabot
Geomechanics, Subject Head – Gioacchino Viggiani
Coordinated by
Ioannis Stefanou
Félix Darve
First published 2024 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address:
ISTE Ltd27-37 St George’s RoadLondon SW19 4EUUK
www.iste.co.uk
John Wiley & Sons, Inc.111 River StreetHoboken, NJ 07030USA
www.wiley.com
© ISTE Ltd 2024The rights of Ioannis Stefanou and Félix Darve to be identified as the authors of this work have been asserted by them in accordance with the Copyright, Designs and Patents Act 1988.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s), contributor(s) or editor(s) and do not necessarily reflect the views of ISTE Group.
Library of Congress Control Number: 2024943344
British Library Cataloguing-in-Publication DataA CIP record for this book is available from the British LibraryISBN 978-1-78945-192-4
ERC code:PE6 Computer Science and Informatics PE6_7 Artificial intelligence, intelligent systems, natural language processingPE10 Earth System Science PE10_19 Planetary geology and geophysics
Ioannis STEFANOU1 and Félix DARVE2
1GeM, UMR 6183, CNRS, Ecole Centrale Nantes, Nantes Université, France
23SR Laboratory, CNRS, Grenoble INP – ENSE3, Université Grenoble Alpes, France
When discussing artificial intelligence (AI), some basic questions immediately emerge: What is AI? How does it work?
Behind this well-known term, a collection of methods of applied mathematics allow the computer to learn and identify patterns in data. This collection of methods is called machine learning (ML) and it is the target of these two volumes, which were authored for the 2023 ALERT Geomaterials Doctoral School.
In combination with the tremendous increase in the computational power, ML has led to incredible achievements in many disciplines of science and technology. These achievements were that striking that some researchers believe that ML could become a turning point for humanity, as the discovery of fire was for our far ancestors!
Until the 1960s, the scientific development has been characterized by the so-called “linear physics” and by modeling represented by analytical equations solved explicitly by the available mathematical tools giving rise to analytical solutions. The field of problems that can be solved in this way is, of course, precious, but very limited.
Then the numerical revolution, based on powerful numerical methods and computers, allowed to solve numerically a large variety of problems, which can be described by a known system of equations. Many limits of this methodology are known due to the abundance of nonlinear processes in nature, chaos and complexity. In any case, numerical analysis, another branch of applied mathematics, has immensely enlarged the class of problems that can be solved today.
However, the numerical solutions of these sets of nonlinear equations can be computationally very intensive or even impossible. Moreover, many problems in engineering are difficult to describe by a set of equations. ML tools provide promising methods for addressing both those problems.
Another aspect of ML algorithms is their ability to solve very complex problems in a “creative” manner. One characteristic example of creativity was shown when the machine beat the world champion of Go, which was invented a long time ago in China. Differently from chess, in which the computer can predict the game evolution several moves in advance, in Go the number of possible moves is extremely large (higher than the number of atoms in the known universe). Therefore, it is necessary to follow creative strategies. Indeed, the machine, “AlphaGo”, has shown that it is able to carry out novel strategies that surprised even the best human players in the world.
All these methods of ML give new powerful tools to scientists and engineers and open new perspectives in geomechanics. The target of these two volumes is to demystify ML to present its main methods and to show some examples of applications in mechanics and geomechanics. Most of the chapters of the volume were edited to provide a pedagogical introduction to the most important methods of ML and to uncover the fundamental notions behind them.
Volume I contains the following five chapters:
The first chapter, “Overview of machine learning in geomechanics”, is the introductory chapter of this volume. In this chapter, we explain how the machine can learn, show a classification of the main methods in ML, outline some applications of ML in geomechanics and highlight its limitations.
The second chapter, “Introduction to regression methods”, focuses on regression, which is one of the fundamental pillars of supervised ML. In this chapter, we introduce the essential concepts in regression analysis and methods by providing hands-on, practical examples.
The target of the third chapter, “Unsupervised learning: Basic concepts and application to particle dynamics”, is twofold. The first part of this chapter is devoted to the description of the basic concepts of the most popular techniques of unsupervised learning. The second part illustrates an application of unsupervised learning to the discovery of patterns in particles dynamics.
The fourth chapter, “Classification techniques in machine learning”, aims to describe what the problem of classification in ML is and illustrates some of the methods used for solving it, without resorting to artificial neural networks (ANNs). Hands-on examples are given and active learning is discussed.
The fifth chapter, “Artificial neural networks: Learning the optimum statistical model from data”, provides a comprehensive introduction to ANNs. Several hands-on examples are given to help the reader grasp the main ideas.
Volume II is organized as follows:
The first chapter, “Data-driven modeling in geomechanics”, presents the theoretical framework of the so-called data-driven computational mechanics. Furthermore, it shows some of its applications for the solution of problems involving Cauchy and Cosserat continua with elastic and inelastic materials, which, naturally, represent common descriptions of geomaterials.
The second chapter, “Bayesian inference in geomechanics”, is intended to provide a concise exploration of Bayesian inference and demonstrates how recent advancements in ML can assist in efficient Bayesian inference within the realm of geomechanics applications.
The third chapter, “Physics-informed and thermodynamics-based neural networks”, shows how to inject prior knowledge into deep learning algorithms. Using various examples, we present physics-informed neural networks for the discovery of partial differential equations and thermodynamics-based ANNs for the discovery of constitutive models of complex, inelastic materials.
The fourth chapter, “Introduction to reinforcement learning with applications in geomechanics”, presents the basic concepts of reinforcement learning, which enables the development of software agents that are capable of making optimal decisions in dynamic and uncertain environments. The chapter closes with two applications of reinforcement learning in geomechanics.
The fifth chapter, “Artificial neural networks: basic architectures and training strategies”, presents more architectures of ANNs and discusses training strategies.
We deeply thank all the authors of the volumes for their comprehensive contributions and their effort to present complex notions in a pedagogical manner. We also deeply thank ALERT Geomaterials for the organization of this doctoral school and all students for their active participation. We hope that the chapters provide a valuable introduction to machine learning in geomechanics.
Ioannis STEFANOU
Félix DARVE
June 2024
Ioannis STEFANOU
GeM, UMR 6183, CNRS, Ecole Centrale Nantes, Nantes Université, France
One of the best methods for making a puppy learn its name is to play the name game:
1) Take your puppy to a quiet place and gain its attention by calling its name in a happy and reassuring tone (e.g. “LoupI look at me!” or “LoupI come here!”).
2) As soon as you have gained its attention, reward it with a treat to encourage its good behavior.
3) Repeat this process several times until your puppy has learned to respond to its name.
In the above iterative process, the puppy learns from experience that reacting when you call its name is a good thing. By repeating this process, the puppy recognizes the pattern between similar, but not identical, vocal signals of its name (training data) and associates them with a specific behavior (task).
Many cognitive processes of humans involve a similar way for associating patterns with specific reactions; for instance, see how we learn to drive a car, speak a foreign language or even understand how a geomaterial behaves!
Machine learning (ML) is an anthropomorphic extension of the above process of “learning” to a computer program, that is, to the machine. By learning, the machine will progressively improve its performance in executing a specific task. According to Mitchell (1997), ML can be defined as follows:
A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.
The whole set of examples that the computer will use during learning is the training set, while each training example is called a training instance or sample. The whole process of learning is called training. The training set is usually distinguished into training data, validation data and test data. More details about those sets and their importance will be given in section 1.5 and in Chapters 2 and 5 of volume 1.
A popular example showing the power of ML was AlphaGo, a ML program, which beat the human world champion of Go (see also (Silver et al. 2016) and Chapter 4 of volume 2 on reinforcement learning). Go is a strategy board game for two players in which the aim is to cover more territory than the opponent (see Figure 1.1). Despite its relatively simple rules, Go is extremely complex. The state-space complexity of Go, that is, the number of legal game positions reachable from the initial position of the game (Allis 1994), is of the order of 10170 (Tromp and Farnebäck 2007), which is much higher than that of chess, which is of the order of 1044 (Shannon 1950), and certainly higher than the number of atoms in the universe. This makes Go a great challenge for ML algorithms.
Figure 1.1.Eighteen times Go world champion Lee Sedoll playing with the machine (courtesy: Lee Jin-man/AP, The New Yorker, https://www.newyorker.com/tech/annals-of-technology/alphago-lee-sedol-and-the-reassuring-future-of-humans-and-machines)
Applying the above-mentioned definition in an ML program for Go, the performance (P) will be measured by the ability of the machine to beat its opponent by gaining experience (E) by executing multiple times the task (T), which is playing the game with itself. Experience can be also obtained by processing existing games of other (human) players.
Another recent popular example of ML is the ChatGPT chatbot developed by OpenAI. ChatGPT was released in November 2022 and provoked a massive public reaction. Let us see what it answers when we ask it to describe itself and what it knows about ALERT Geomaterials.
Ioannis:Describe yourself.
ChatGPT:I am ChatGPT, an AI language model developed by OpenAI. I have been trained on a vast amount of text data from the internet, books and other sources. My purpose is to assistusers in generating human-like text responses and providing information on a wide range of topics.
I don’t have a physical form or personal experiences like humans do, but I can understand and generate text in multiple languages. I strive to provide helpful and accurate responses to the best of my abilities, drawing upon the knowledge I have acquired during my training.
I can engage in conversations, answer questions, offer explanations and assist with various tasks. However, please keep in mind that while I strive for accuracy, I may occasionally generate incorrect or incomplete information. It’s always a good idea to verify important details from reliable sources.
Overall, I am here to assist and provide information to the best of my abilities. Feel free to ask me anything you’d like to know or engage in a conversation on any topic of your interest.
Ioannis:How do you work?
ChatGPT:As an AI language model, my functioning can be divided into two main phases: training and inference.
Training: During the training phase, large amounts of text data are used to train the model. In the case of GPT models like ChatGPT, this training involves a process called unsupervised learning. The model learns to predict what comes next in a given sequence of text, based on the patterns and relationships it discovers in the training data. By doing this over a vast dataset, the model gradually learns grammar, facts, reasoning abilities and the nuances of human language.
Inference: Once the training is complete, the model moves into the inference phase, where it can generate responses and interact with users. Here’s a simplified overview of how it works during inference:
Input: When you provide a prompt or a message, the model receives it as input. The input can be a question, a statement or any text you want to generate a response to.
Tokenization: The model breaks down the input text into smaller units called tokens. These tokens can represent words, subwords or characters. Tokenization helps the model understand the structure and meaning of the input.
Processing: The tokenized input is then passed through several layers of neural networks. These layers are designed to understand the relationships between the tokens and capture the context and meaning of the text.
Generation: Once the input is processed, the model generates a response. It predicts the most likely next tokens based on the context and the patterns it has learned during training. The generated tokens form the model’s response.
Iteration: The process of tokenization, processing and generation can happen in an iterative manner. The generated response can be further used as input, creating a back-and-forth conversation or a continuous interaction.
It’s important to note that while ChatGPT can generate impressive responses, it may sometimes produce incorrect or nonsensical answers. It doesn’t have real-world experiences or access to current information beyond its knowledge cutoff date. It’s always a good idea to fact-check and verify important information from reliable sources.
Ioannis:What is ALERT Geomaterials in summary?
ChatGPT:In summary, ALERT Geomaterials is an international scientific association dedicated to the study of the mechanics and behavior of geomaterials, including soil, rock and other geological materials. It brings together researchers and professionals in the field of geomechanics and geotechnical engineering to promote collaboration, knowledge exchangeand scientific advancements. ALERT Geomaterials organizes workshops, conferences and training courses, providing a platform for researchers to share their findings and discuss topics related to soil and rock mechanics, slope stability, geotechnical engineering, experimental testing and numerical modeling. The association also publishes a journal focused on energy geotechnics and geoenvironmental engineering. Overall, ALERT Geomaterials plays a crucial role in fostering research and innovation in the field of geomaterials.
Impressive! At this point, it becomes quite tempting to let ChatGPT finish this chapter or even both volumes! However, I assure you, we took the necessary time to entirely draft these volumes word by word. Read carefully ChatGPT’s answers. Are all of them correct?
It is straightforward to see how the above definition for ML applies to ChatGPT and to many other applications of ML, such as speech, handwriting, face and object recognition, voice-to-text and vice versa, translation, text auto-correction and auto-completion, spam filtering, computer games, self-driving cars, medicine, forecasting, banking, security, marketing, control problems, engineering and, of course, (geo-)mechanics.
ML is an evolving field of knowledge and involves a plethora of methods and combinations of those. In the next section, we will try to categorize and classify them in groups.
Having described and defined what ML is, it is natural to ask what is artificial intelligence (AI)? ML and AI are closely related and the latter is considered to include the former. The exact definition of AI seems to be a bit foggy for the time being and depends on how we define the terms “artificial” and “intelligence”. In the author’s opinion, it is easier to describe the characteristics of AI, rather than give a unique and exact definition of the term. Another example of a hard definition, in a totally different domain, is what is justice? It seems easier and more important to describe the characteristics of justice (e.g. equality for all) rather than give a precise definition of the term.
There are numerous ML methods in the literature. Therefore, it is useful to classify them into different categories. Here, we follow the classification of (Géron 2019), who categorizes ML to supervised learning versus unsupervised learning, batch learning versus online learning and instance-based learning versus model-based learning methods. Of course, this is a rough classification and one method can combine different categories, as shown in Figure 1.2.
Figure 1.2.Classification of ML methods. Method 1 is a supervised, batch, instance-based method, while Method 2 has components belonging to different categories
In supervised learning, the training set includes the desired solutions/predictions, which are called labels.
For instance, imagine that your training set is hundreds of images with each one containing either a dog or a cat and that the training task is to distinguish the photos of dogs from those of cats. This is called a classification problem (see Chapter 4). If we give to the computer the information of which photos show dogs and which cats, or in other words, if we label the training set, then the learning is called supervised. If the training is successful, then the computer will have learned to identify cats from dogs even in a new set of photos that it has not processed during its training.
On the contrary, if the training set is unlabeled, then the learning is called unsupervised learning. In this case, the computer will eventually understand the pattern of the two different animals shown in the photos and it will be able to distinguish dogs from cats in an unsupervised manner (see also clustering in Chapter 3 of volume 1). Of course, the machine will not have learned to call a cat, cat and a dog, dog, because we have not given this additional information, but it will have identified their differences and separated the data into two different classes.
Another ML problem is regression (see Chapters 2 and 5 of volume 1 and Chapters 3 and 5 of volume 2). In regression, the training set contains one or several numerical inputs, also called features, and the task is to predict one or several numerical outputs, also called predictors, that depend on the inputs. Consider as an example the prediction of the stress response of a geomaterial, which as we know depends on several input parameters, such as the applied strain, available information about the evolution of its microstructure (e.g. the position and the velocity of the grains of a sand obtained by a discrete element method [DEM] analysis), history and/or other features. As the training set contains both the output and the input, the training for predicting the stresses based on the above-mentioned features is supervised. However, the identification/extraction by the machine of a representation of the most important features of the microstructure that are related to the prediction of the stress response is unsupervised. Examples of unsupervised methods in ML are feature extraction, anomaly detection and dimensionality reduction, in which the aim is to reduce the size of the training set without losing important information with respect to a specific task or measure and data compression (among others).
In Table 1.1, we provide a list of important supervised and unsupervised methods in ML. However, not all ML methods can be categorized to supervised and unsupervised. A notable example is reinforcement learning (RL), which does not require labeled data or a training set. For more details on RL, we refer to Chapter 4 of volume 2. Finally, when labels are not available for all the samples in the training set, we refer to semisupervised learning. Most semisupervised methods are a combination of supervised and unsupervised algorithms. An example of semisupervised learning is active learning (see Chapter 4 of volume 1 for more details).
Table 1.1.Classification of some important methods in ML. The asterisk denotes that not all variations of the method fall into this category
ML method
Supervised
Unsupervised
Online
Presented in these volumes
Linear regression
✓
✓
Logistic regression
✓
✓
Polynomial regression
✓
✓
Lasso, Ridge
✓
✓
k
-Nearest neighbors
✓
✓
Support vector machines (SVM)
✓
✓
Decision trees
✓
Random forests
✓
Artificial neural networks (ANNs)
✓
✓
✓
Autoencoders
✓
✓
✓
Clustering
✓
✓
Principal component analysis (PCA)
✓
✓*
Locally linear embedding (LLE)
✓
Reinforcement learning (RL)
✓
✓
Another manner to classify ML algorithms is based on whether they can improve their predictions by providing them with new data that may become available after the first training.
In batch learning, the machine has to be trained over a fixed training data set, without being able to add more data to the training set. Therefore, batch ML methods cannot improve their performance in a specific task by providing them with more data after the end of the training.
On the contrary, ML methods that support online learning allow to modify and increase the initial training set. Consequently, they are more flexible and suitable for large training sets, as they can be partially loaded into the memory of the computer and used whenever needed. Old data, over which the computer has been already trained, can also be erased to save space. Online learning is also very convenient when a constant flow of information exists, contrary to batch learning, in which the machine must be retrained over the whole data set. In other words, in batch learning, when new data become available the training set has to be updated and the training has to be repeated from scratch. This can be fine for some applications, but in many others it could have a very high computational and data storage cost.
Not all ML algorithms support online learning (see Table 1.1).
One more way to categorize ML methods is based on the way data is learned in order to make predictions for data outside the training set. Two ML categories can be distinguished: instance-based (also known as memory-based or lazy) learning and model-based learning (also known as physics-based in some applications).
In instance-based learning, training data are simply interpolated in a high (usually) dimensional space. Then, new predictions are made based on how close or how similar new data are to those used for training. Similarity is measured on the basis of a distance measure, depending on the data and the problem at hand. In this sense, data are “memorized” by the machine, thus the term memory based. The quality of the predictions of data outside the training set is determined by how well new data are represented by the data of the training set. Instance-based approaches are straightforward to apply in any data set and they do not require any particular knowledge about the structure of the data or other characteristics that they might have. The hope of the user of instance-based methods is the machine will eventually identify by itself the hidden patterns in the data and give correct predictions even for data outside the training set. Examples of instance-based methods are artificial neural networks (ANNs), decision trees, random forests, k-nearest neighbors and many clustering techniques, locally linear embedding (LLE) and principal component analysis (PCA).
Despite the versatility and the many advantages of instance-based ML methods, they have an important drawback. In physics and engineering, we know that data have to respect at least some fundamental principles, for instance, the conservation of mass, energy and linear and angular momentum. Therefore, instance-/memory-based predictions that do not respect these conservation laws are unacceptable and can be even dangerous for applications. Data in this case have to be processed and learned by the machine under a model that includes the laws of physics.
Model-based ML adopts a model with some model parameters, which are optimized during training in order to optimally represent the data of the training set. Then the model is used to make predictions for unseen input data. Of course, if the adopted model is poor, the predictions will be poor as well.
Both instance-based and model-based approaches can introduce bias: the former because of limited data over which they were trained and the latter because of the model chosen. To fix the ideas, an example of instance-based and model-based ML is given in Figure 1.3. The training set contains the shear stress at failure (predictor) for a given normal stress (feature) of a series of experimental tests of a frictional interface. An instance-based method could give very poor predictions for unseen data, while a model-based prediction will be as good as the model is for describing the data. In this simple example, a Coulomb model was adopted for the model-based approach.
Figure 1.3.Instance-based versus model-based ML for predicting failure (slip) of a frictional interface based on experimental data (fictitious)
The applications of ML in science are nowadays numerous and increasing. The same holds for applications of ML in geomechanics. It is out of the scope of this chapter to provide a comprehensive literature review, but it is worth mentioning some research directions in constitutive modeling, geotechnics, geophysics and image correlation that can be a starting point for the interested reader.
Geomaterials are among the most complex materials to study and model. The main reason is that their mechanical response is governed by multiphysics couplings at multiple spatial and temporal scales, which result in a macroscopic strongly nonlinear and dissipative response.
Empirical constitutive models are often used to describe the mechanical behavior of geomaterials. Classification and regression methods can be used for choosing the appropriate constitutive model that best fits the experimental data (see (Marques et al. 2022; Morand and Helm 2019; Guo et al. 2021; Schulte et al. 2023; Pouragha et al. 2020; Zhang et al. 2021a) and references therein). Another challenging task is to capture the macroscopic behavior of geomaterials based on the behavior of their microstructure and its evolution due to loading. Multiscale approaches can be employed for this purpose, requiring the solution of a boundary value problem (BVP) of an elementary cell of the microstructure at each point and time increment of the macroscopic analysis. Then, based on homogenization, the intrinsic constitutive behavior of the microstructure can be upscaled to the macroscopic level. For this purpose, mixed numerical schemes, such as the FE2 method (Feyel 2003; Lloberas Valls et al. 2019; Eijnden et al. 2017) and FEM DEM (Nitka et al. 2011; Nguyen et al. 2014