Supervised Machine Learning for Science - Christoph Molnar - E-Book

Supervised Machine Learning for Science E-Book

Christoph Molnar

0,0
22,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

Machine learning has revolutionized science, from folding proteins and predicting tornadoes to studying human nature. While science has always had an intimate relationship with prediction, machine learning amplified this focus. But can this hyper-focus on prediction be justified? Can a machine learning model be part of a scientific model? Or are we on the wrong track?


In this book, we explore and justify supervised machine learning in science. However, a naive application of supervised learning won’t get you far because machine learning in raw form is unsuitable for science. After all, it lacks interpretability, causality, uncertainty quantification, and many more desirable attributes. Yet, we already have all the puzzle pieces needed to improve machine learning, from incorporating domain knowledge to creating robust, interpretable, and causal models. The problem is that the solutions are scattered everywhere.


In this book, we bring together the philosophical justification and the solutions that make supervised machine learning a powerful tool for science.


The book consists of two parts:


Part 1 justifies the use of machine learning in science.


Part 2 discusses how to integrate machine learning into science.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 307

Veröffentlichungsjahr: 2024

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Supervised Machine Learning for Science

How to stop worrying and love your black box

Christoph Molnar & Timo Freiesleben

Summary

Machine learning has revolutionized science, from folding proteins and predicting tornadoes to studying human nature. While science has always had an intimate relationship with prediction, machine learning amplified this focus. But can this hyper-focus on prediction be justified? Can a machine learning model be part of a scientific model? Or are we on the wrong track?

In this book, we explore and justify supervised machine learning in science. However, a naive application of supervised learning won’t get you far because machine learning in raw form is unsuitable for science. After all, it lacks interpretability, causality, uncertainty quantification, and many more desirable attributes. Yet, we already have all the puzzle pieces needed to improve machine learning, from incorporating domain knowledge to creating robust, interpretable, and causal models. The problem is that the solutions are scattered everywhere.

In this book, we bring together the philosophical justification and the solutions that make supervised machine learning a powerful tool for science.

The book consists of two parts:

Part 1 justifies the use of machine learning in science.

Part 2 discusses how to integrate machine learning into science.

Preface

We wrote this book because we are passionate about science and machine learning, particularly their interaction.

Our collaboration began during our Ph.D. studies, where we co-authored papers on interpretable machine learning. However, we soon realized that interpretability was but one piece of a puzzle. Exploring causality, uncertainty quantification, robustness, and other tools proved essential to convert ‘raw’ machine learning into a proper scientific tool. The deeper we dug, the clearer it became that an even bigger piece was missing: A strong justification for using machine learning in scientific modeling was needed. All our insights led us to this book.

Prepare for a journey through the philosophy of science, machine learning theory, pragmatic modeling advice, and short stories of our raven scientists.

1 Introduction

Researchers make millions of protons collide every second inside the Large Hadron Collider (LHC) deep below the French-Swiss border near Geneva to understand what our universe is ultimately made of. Each collision creates subatomic particles which often decay quickly, creating even more particles. Sensors surround the collision zone and record the passage of these particles, turning each collision event into data – a lot of data. CERN, the European Organization for Nuclear Research, is the world’s largest particle physics laboratory. CERN’s data center processes up to a petabyte of data per day. That’s a million gigabytes. These amounts of data are far beyond anything a human can look at.

To call the analysis of this data “challenging” is a complete understatement: It’s difficult to reconstruct the collisions because multiple collisions occur simultaneously; sensor data is generated faster than it can be written to disk; and some particles decay in less than a trillionth of a trillionth of a second. To make the data processing and analysis manageable, CERN researchers rely on machine learning. Machine learning systems scan the incoming event data and decide which to write to disk and which to ignore. Machine learning models also estimate the energy and timing of signals. Other machine learning models help reconstruct events and remove noise. Today, particle science at CERN is unthinkable without machine learning.

But it’s not just particle science. If you look closely, you’ll find machine learning in almost every field (at least, we couldn’t find any without it). There are fields where everyone expects machine learning applications, such as geoscience, materials science, or neuroscience [1], [2], [3]. But you might be surprised to see machine learning applied in other fields, such as anthropology, history, or theoretical physics [4], [5], [6].

Let’s look at a few applications to get a better sense of how scientists use machine learning. In each example, machine learning plays a different scientific role.

Raven Science was stuck. The Ravens were drowning in complex data and had no idea how to compress it into a scientific model. A peculiar bunch of neuro-stats-computer Ravens had recently developed a new modeling approach called supervised machine learning, which learns predictive models directly from data. But is this kind of modeling still science, or is it alchemy?

1.1 Labeling wildlife in the Serengeti

Wildlife ecologists want to understand and control the complexities of natural ecosystems. They ask questions like: What animals live in the ecosystem? How many animals of each species are there? What do the animals do?

For a long time, it was difficult to analyze these questions quantitatively because there was simply no data. But today, motion sensor cameras, often referred to as camera traps, make huge amounts of data available. Camera traps combine cameras with motion sensors. Once something moves in front of the camera, it takes pictures. The Serengeti, home to zebras, buffalos, and lions, is one of the ecosystems where ecologists placed various camera traps. But to turn these images into insights, people had to label them: they had to decide whether an animal was in the picture, what species it belonged to, and describe what it was doing. A tedious task!

For this reason, Norouzzadeh et al. [7] used the already labeled data to train a convolutional neural network that performs this task automatically. Figure 1.1 shows the output of this labeling task. The model achieves a prediction performance of 94.9% accuracy on the Snapshot Serengeti dataset – the same performance as crowdsourced teams of volunteers who typically label the data.

Figure 1.1: The image was taken by motion sensor cameras in the Serengeti and shows wildlife animals. The machine learning model by Norouzzadeh et al. [7] correctly identifies, counts, and describes the animals in the image. Used with permission from [7].

The primary goal of the paper is to provide a reliable labeling tool for researchers. The labels for the data should be accurate enough to draw scientific conclusions. Therefore, Noroyzzadeh et al. [7] are concerned with the predictive accuracy of the model under changing natural conditions and the uncertainty associated with the predictions.

1.2 Forecasting tornadoes to inform actions

In the case of severe weather events like tornadoes, every minute counts. You need to find shelter for yourself and your loved ones quickly. But tornadoes are hard to predict. They form rapidly and the exact conditions for a tornado to form are not fully understood.

Lagerquist et al. [8] therefore use machine learning to predict the occurrence of a tornado within the next hour. They trained a convolutional neural network on data from two different sources: storm-centered radar images and short-range soundings. Their model achieves similar performance to ProbSevere, an advanced machine learning system for severe weather forecasting that is already in operation.

The work aims to provide a system that could be used in deployment to help warn the population of tornadoes. Therefore, the paper centers on the methodology and the assessment of the model’s predictive performance in relevant deployment settings. In particular, they analyzed in which regions the model worked best and worst in terms of prediction error for different types of tornadoes. The use case differs from the Serengeti animal classifier where the goal was labeling, not case-by-case decision-making (although summary data from Serengeti might end up for decision-making).

1.3 Predicting almond yield to gain insights

California is the almond producer of the world: it produces 80% of all the almonds on Earth, probably in the entire Universe. Nitrogen, a fertilizer, plays a key role in growing the nuts.1 Fertiliser is regulated, meaning there’s an upper limit for its use. Before you can develop a good fertilizer strategy, you have to quantify its effect first. Zhang et al. [9] used machine learning to predict the almond yield of orchard fields, based on weather, orchard characteristics using remote sensing (satellites), and, of course, fertilizer use. The authors examined the model not only for its predictive accuracy but also for how individual features affect the prediction and how important they are for the model’s performance.

The goal of this research is twofold: prediction and insight, so it goes beyond just making a decision. The model may help almond growers make better decisions about fertilizer use but also contribute to the scientific knowledge base. In general, the scientific fields of ecology and agriculture are at the forefront when it comes to adapting machine learning [10], [11].

1.4 Inferring protein structures for scientific hypotheses

One of the big goals in bioinformatics is to understand how protein structures are determined by their amino acid sequences. Proteins do a lot of heavy lifting in the body, from building muscles and repairing the body to breaking down food and sending messages. A protein’s structure is largely determined by its sequence of amino acids, the building blocks of protein. The protein’s structure, however, ultimately determines its function. The problem: It is difficult to predict how a protein will be structured if you only have the amino acid sequence. If scientists could do that reliably, it would help them with drug discovery, understanding disease mechanisms, and designing new proteins. Meet AlphaFold [12], a deep neural network that can predict protein structures from their amino acid sequence reasonably well. See Figure 1.2 for one example.

Figure 1.2: Protein structure, predicted in blue, determined by experiment in green. Figure by [12], CC-BY (https://creativecommons.org/licenses/by/4.0/)

AlphaFold was trained on a dataset of 100,000 protein sequences. Beyond pure prediction, the algorithm aids with molecular replacement [13] and interpreting images of proteins taken with special microscopes [14]. There’s now even an entire database, called AlphaFold DB, which stores the predicted protein structures. On its website [15] it says:

“AlphaFold DB provides open access to over 200 million protein structure predictions to accelerate scientific research.”

Not only has the prediction model AlphaFold become part of science, but its predictions serve as a building block for further research. To justify this sensitive role of machine learning, the model needs to be doubly scrutinized.

1.5 What role does machine learning play in science?

You may look at these examples and argue that machine learning is only one of many tools in the scientific process, like Excel or Python, or that the scientific goal was only to evaluate the model’s performance.

If this were the case, the value of machine learning in science would be very limited – machine learning could create ‘models’, but these bear little resemblance to scientific models that represent phenomena. The most important methodological steps in science – such as creating scientific models or theories, testing them, and finding interesting new research problems – would remain unaffected by machine learning.

Indeed, there are good reasons why everyone is cautious about machine learning in science – in its raw form, machine learning reduces every scientific question to a prediction problem. This restraint is emphasized by popular critical voices about machine learning.

Take Judea Pearl, a proponent of causal inference, who said that

“I view machine learning as a tool to get us from data to probabilities. But then we still have to make two extra steps to go from probabilities into real understanding — two big steps. One is to predict the effect of actions, and the second is counterfactual imagination.” [16]

Pearl refers here to his ladder of causation that distinguishes three ranks: 1. association, 2. intervention, and 3. counterfactual reasoning [17]. He highlights that machine learning remains on rank one of this ladder and is only suitable for static prediction.

Gary Marcus, one of the most well-known critics of current deep learning claimed that

“In my judgement, we are unlikely to resolve any of our greatest immediate concerns about AI if we don’t change course. The current paradigm – long on data, but short on knowledge, reasoning and cognitive models – simply isn’t getting us to AI we can trust.” [18]

In the paper, Marcus criticizes the lack of robustness, the ignorance of background knowledge, and the opacity in the current machine learning approach.

Noam Chomsky, one of the founding fathers of modern linguistics argued:

“Perversely, some machine learning enthusiasts seem to be proud that their creations can generate correct “scientific” predictions (say, about the motion of physical bodies) without making use of explanations (involving, say, Newton’s laws of motion and universal gravitation). But this kind of prediction, even when successful, is pseudoscience. While scientists certainly seek theories that have a high degree of empirical corroboration, as the philosopher Karl Popper noted, ‘we do not seek highly probable theories but explanations; that is to say, powerful and highly improbable theories.’ ” [19]

Chomsky criticizes machine learning for its exclusive focus on prediction rather than developing theories and explanations. In the article, he is particularly doubtful that current large language models provide deep insight into human language.

1.6 Machine learning can be more than a tool

We share these critical views above to a certain degree. Machine learning algorithms have problems with being dumb curve-fitters that lack causality. Purely predictive models might make bad explanatory models. And machine learning doesn’t produce scientific theories as we are used to.

Nevertheless, there are reasons for optimism that these issues can be addressed and that machine learning can play a big role in science. The roles we see machine learning play were already shining through some of the initial examples:

Inform actions:

Tornado forecasting is not just an intellectual forecasting exercise, but is crucial for taking the right measures to prevent harm.

Gain insights:

In the case of almond yield prediction, the goal wasn’t only to build a prediction machine, but the researchers also extracted insights from the model.

Exploration:

AlphaFold isn’t a mere proof of concept. For example, protein structure predictions are used by researchers to explore and test new drugs.

However, machine learning cannot fulfill these roles in its raw form. It must be equipped with “upgrades” such as (causal) domain knowledge, interpretability, uncertainty, and so on, to become a new approach for informing peoples’ actions, extracting insights from data, or generating new hypotheses. We believe that fully upgraded supervised machine learning models have the potential to become a full-blown scientific methodology that helps scientists to better understand phenomena.

The upgrades we need are readily available. Every year, new sub-fields are emerging. If we think of machine learning not in isolation, but in conjunction with these upgrades, machine learning could provide great scientific value. Like a puzzle, we have to piece all of it together, starting with a justification of why a machine learning approach focused on prediction is a good core idea for research.

But before we can justify machine learning or start solving the puzzle, let’s look at supervised machine learning in its raw form!

1. Technically, they are legumes, but we’re not biologists and we eat them as if they were nuts.

2 Bare-Bones Supervised Machine Learning

This chapter looks at supervised machine learning, stripping it to its bare-bones. Why supervised machine learning? Of course, the applications of unsupervised learning and reinforcement learning in science are no less interesting. However, after looking at many scientific applications, we realized that the primary goal was often to solve a supervised learning problem. Unsupervised and reinforcement learning techniques have often been used to support this goal, e.g. by providing powerful representations or fine-tuning models.

What is supervised machine learning about? Think back to the tornado prediction example from the introduction. Supervised machine learning produces models that provide output predictions, for example, concerning the occurrence of a tornado in the next hour. To obtain these predictions, models must be fed with so-called input feature values, e.g. storm-centered radar images and short-range soundings. The search for models is done via a learning algorithm and a labeled dataset, consisting of input-output pairs, also called training data.

A young Raven called Rattle was the first to adopt supervised machine learning. At first, the other Raven scientists were skeptical. Too new. Unproven. Risky. Not the way of Raven Science. Nevertheless, Rattle began to explain to the first interested Ravens what machine learning was all about.

2.1 Describe the prediction task

To use supervised machine learning, you first have to translate your problem into a prediction task with the following ingredients:

Pick a target variable to predict.

For example, the occurrence of tornadoes in a 1-hour time window, coded as 0 (no tornado) and 1 (tornado).

Define the task.

The task is related to the target and can range from classification and regression to survival analysis and recommendation. Depending on how you frame the tornado prediction problem, you end up with different task types:

Will a tornado occur within the next hour? classification (Tornado Y/N)

How many tornadoes will occur this year? regression (0, 1, 2, …)

How long until the next tornado occurs? survival analysis (e.g. 2h)

Decide on the evaluation metric.

This metric defines what counts as a good or bad prediction. To classify tornadoes in 1-hour windows, you could use metrics such as F1 score

1

or accuracy.

Choose features from which to predict the target.

Features could be hourly measurements from radar stations, like wind speeds and precipitation.

Once the task is defined, you need data.

2.2 Get the data

Machine learning requires data. Data points are represented as pairs of feature and target values . In the tornado case, might be radar measurements from 10 AM to 11 AM on May 25th, 2021, in a specific 3km by 3km patch in the United States. Accordingly, would indicate whether a tornado occurred in this time and place.

After cleaning and pre-processing the data, it is typically randomly split into three subsets for machine learning:

A training dataset, , used to train the model.

A validation dataset, , used to validate modeling choices and selection.

A testing dataset, , used for final performance evaluation.

The simplest way is to randomly split your dataset into these three buckets. In reality, you have to adapt the splitting mechanism based on data and tasks. For example, time series and clustered data require splitting schemes that respect the data structure. For classification with imbalanced data, you might want to use mechanisms that ensure that the minority class occurs often enough in each split. Anyways, in a completely random 3-way split, our data point would fall into one of these buckets.

2.3 Train the model

Training a machine learning model means running an algorithm that takes as input the training data and outputs the model. The model describes a function that outputs predictions based on input features.

The training requires making some choices:

Select a class of models . For example, decision trees or neural networks. Usually, you have to specify this class further, e.g., by choosing the maximal depth of a tree or the specific architecture of the neural network. Radar readings might be on a spatial level, so you might pick a convolutional neural network such as ResNet for the tornado prediction task.

Choose a training algorithm . The training algorithm takes the training data and produces a prediction model . For example, neural networks typically use stochastic gradient descent with backpropagation as the training algorithm. But you could also train a neural network using genetic algorithms.

Set hyperparameters. Hyperparameters control the training algorithm and affect the models that the training produces. Some hyperparameters are related to the model class, like the number of layers in your neural net or the number of neighbors in the k-nearest-neighbors algorithm. Others are connected to the training algorithm, such as the learning rate or the batch size in stochastic gradient descent.

For example, when you train a convolutional neural network to predict tornadoes, you use stochastic gradient descent () with the training data . To do this, you have to set the hyperparameters. After the training process, you get a trained CNN from the model class of CNNs.

The training process seeks to produce a model that makes a minimal error on the training data:

where is the loss function the model optimizes for. Some training algorithms can optimize arbitrary (well, some constraints remain) loss functions directly, like neural networks using gradient descent, while other training algorithms have built-in and sometimes less explicit losses they optimize for, like greedy splits in CART decision trees [20].

2.4 Validate modeling choices

Training doesn’t guarantee that it will produce the best-performing model. For example, you might not have picked the best model class, because you have used linear regression but the best model for that task might be a tree ensemble. And even if you picked the best model class, you might have set the hyperparameters non-optimally. You need a procedure to both pick a model class and the hyperparameters. A naive approach would be to compute the evaluation metric for the training data, but this would be a bad idea since some models may overfit the training data.

Overfitting

We say a model overfits when it has good performance on the training data but performs poorly with new unseen data from the same distribution. A model that overfits is good at reproducing the random errors in training data but fails to capture the general patterns.

You typically compute the evaluation metric using a separate validation dataset . Since the model wasn’t trained using the validation data, you get a fair assessment of its performance. With the validation data, you can compare multiple models and hyperparameter configurations and pick the best-performing one. This allows you to detect underfitting and overfitting and to guard against these problems by properly regularizing and tuning the model.

2.5 Evaluate the model on test data

How well would your final model perform? Unfortunately, you can’t use the evaluation metrics you have computed from training and validation data. Both will be too optimistic since you already used the training data to train the model and the validation data to make modeling choices. Instead, you have to evaluate the model performance on test data . This gives you a realistic estimate of the model performance.

2.6 And repeat: the role of resampling data

Having just one split into training, validation, and testing is not very data efficient. Typically, the data is split multiple times. A common technique is cross-validation, which splits the data into k different parts. Let’s say you use . Nine out of ten folds might be used for training and validation, and the tenth for test data. You cycle through the folds so that each fold is used as test data once. This way, you always use “fresh” data for evaluating the model. Other sampling methods such as bootstrapping and subsampling can be used here as well. But even having multiple folds may not give stable results – would you generate the fold splitting again, you may get different estimates. So another established method is to repeat the sampling.

You have another choice to make, and that is how to split training and validation data into nine parts. You could either do a single split or do cross-validation again. Like in the movie Inception, which is about dreams within dreams, you go one level deeper and do cross-validation within cross-validation, a procedure called nested cross-validation [21]. The advantage of (nested) cross-validation is better estimates of the model performance, and better models since you use the data more efficiently.

2.7 Bare-bones machine learning can be automated

Once you have defined the prediction task and your data, the entire training process can be completely automated. The subfield of machine learning called AutoML aims to completely automate the machine learning training process and make machine learning engineers redundant [22]. Upload your data, pick a column as your target, and pick an evaluation metric. Click a button. And the machine does everything for you. Data splitting, hyperparameter tuning, model selection, model evaluation. We call this automatable practice of machine learning “bare-bones machine learning”. The big question is: How does such an optimization-focused approach mix with a complex practice like science?

1. The F1 score is a metric that balances precision and recall, particularly useful for evaluating performance on imbalanced datasets.

3 The Role of Prediction in Science

Supervised machine learning is the discipline of producing high-performing prediction models. While the complex dance between prediction and science is age-old, supervised machine learning is relatively young. Let’s find out how this algorithmic approach to learning from data fits into science.

Prediction

A prediction is an educated guess about an unknown outcome based on current information, such as predicting tomorrow’s weather or the 3D structure of a protein from its amino acid sequences.

Rattle and a group of machine learning advocates presented a petition to the Elder Council of Raven Science. It contained a series of prediction problems, all of which could be solved by machine learning, such as predicting tornadoes or distinguishing poisonous berries from healthy ones. The Council began a difficult philosophical debate: What role should prediction play in Raven Science?

3.1 Scientific theories produce predictions

Since this book has “science” in the title, we are obliged to mention Einstein and his theory of general relativity, one of the biggest success stories of science. Inspired by theoretical and philosophical reflections such as the principle of relativity, Mach’s principle, and the equivalence principle1, Einstein described a theory according to which gravity is a geometric property of space and time. When Einstein developed his theory between 1907 and 1915, it was, unlike his theory of special relativity, not based on novel empirical observations that the old theory did not account for. Instead, Einstein was driven by intuition, a desire to generalize, and mathematical beauty. Still, Einstein’s theory provided what elevated it beyond purely philosophical speculation: It made predictions.

Einstein’s theory of general relativity predicted that:

Black holes exist.

Time slows down near massive objects.

Gravity bends light, makes waves, and affects the red shift in light.

Massive rotating objects can drag space-time with them, also called the Lense-Thirring precession.

All of these predictions have been confirmed empirically. Some predictions were confirmed soon after Einstein formulated his theory, such as that gravity bends light, which was observed by Eddington and Dyson in 1919 at a solar eclipse. Others, like the existence of gravitational waves, took until 2015 to be confirmed because they required complex new technologies of measurement. It is surprising how long Einstein’s theory has lasted and how correct its predictions are until today – the general theory of relativity is among the best-tested theories in science.

Scientific Theories

Scientific theories are abstract and universal explanations for phenomena that have strong and diverse evidential support.2 They are strongly tested and cohere with other theories. Theories are thus widely accepted in their fields. Furthermore, theories are not necessarily quantitative. Some of the most important scientific theories are qualitative, such as the germ theory of disease, Darwin’s theory of evolution, and cell theory.

3.2 Prediction can be the goal of science

Less well-known than Einstein’s story is the story of Lewis Fry Richardson, the first person to predict the weather using numerical methods. Besides meteorology, Richardson was an interesting and widely skilled character: he developed a sonar after the tragedy of the Titanic, he found inconsistencies in the measurement of the length of coasts that later inspired Mandelbrot for his theory of fractals, and he was a convinced pacifist and brought quantitative methods to study peace and conflicts between nations.

When Richardson started to work on meteorology, weather forecasting was largely based on experience with similar weather conditions. He wasn’t the first to propose the use of physical theories in weather forecasting, but he was the first to test his numerical model in practice. He developed a mathematical model aimed at predicting the weather:

He divided a weather map into squares of 200km side length.

He subdivided each of the squares into layers according to their respective heights.

He listed all the physical equations he deemed important to model the dynamics.

He solved the respective nonlinear partial differential equation to predict air pressure, wind speed, and temperature for two of the squares.

Scientific Models

Scientific models are typically mathematical representations of aspects of phenomena that allow scientists to predict, explain, or reason about these phenomena. Models can be grounded in theory or even be applications of theories to a specific context. For instance, Richardson used theories from physics to build his weather forecasting model. However, models don’t have to be closely tied to theory. They can also rely on data or common sense. Contrary to theories, models are often more concrete with a smaller scope and more practical. Evidence incompatible with the model just limits the scope of the model, but an otherwise useful model might still survive. Different models for similar aspects of reality can coexist in a field even if they are in conflict.

Unfortunately, his first attempts were unsuccessful. It took him six weeks to make a weather forecast for a six-hour time frame. Even worse, his predictions were pretty far off! Nevertheless, methodologically, he was on the right track; while first ignored after its publication in 1922, his book “Weather Prediction by Numerical Process” is now a modern classic in meteorology. Up until today, scientists use numerical methods, physical measurements, and physical theories to forecast the weather [24]. Fortunately, numerical weather forecasts have substantially improved since Richardson’s first attempt. The availability of atmospheric data, e.g., via airplanes or weather stations, and even more importantly, the access to electronic computers that can solve differential equations efficiently in real-time, allow to predict the weather (at least within 24 hours) reliably. Weather forecasts have become so reliable that when you ask a friend about the weather at the moment, she may check her weather app instead of looking outside …

3.3 Prediction and science are inseparable

Prediction was central to both Einstein’s general theory of relativity and Richardson’s weather forecasting. If the predictions of general relativity had been incorrect, the theory would have been thrown on the scrap heap of science, no matter how beautiful it was. Similarly, if numerical weather forecasting did not work, it would be ignored; in fact, this is exactly what initially happened to Richardson when he proposed his theory.

Predictions played different roles in both cases: In Einstein’s case, the predictions guided new experiments such as the search for black holes, confirmed the theory, and had in the beginning very few practical implications.3 In weather science, on the other hand, accurate prediction is ultimately the goal; also because controlling the weather is at least so far out of reach. Weather predictions are useful: They help people decide when to leave the house, when to harvest plants, how to schedule flights, when to expect catastrophic weather events, and tell your trusted ice shop when to open. Prediction is not a nice byproduct of weather science – it is the goal.

It is widely agreed that prediction is one of the core aims of science [25]. Prediction connects scientific models and theories with events in the world. Science without successful predictions would look a lot like Greek mythology. Pointing out that Zeus is very angry may sound like a nice explanation for the tornado in your yard but how to find out if it is true without predictions? More importantly, will your insurance cover angry gods or only angry birds…

Scientific Predictions

Scientific predictions are statements about data, past events, or the future based on experience or knowledge4. This means that prediction is not just about forecasting the future like tomorrow’s weather. Predictions can also concern the past. The astronomer Johannes Kepler predicted for instance in 1614 that the Star of Bethlehem was a planetary conjunction appearing in the year 7 BC. The term “prediction” also covers cases in which you already know the correct prediction. For example, if a medical diagnosis tool correctly predicts past cases, it is coherent with the observed data. Nevertheless, the gold standard for validating scientific models and theories is accurate predictions of unknown (future) events or unseen data.

3.4 Prediction serves many purposes

Taking a birds-eye view of science, you find that predictions are everywhere and serve many purposes [26]:

Falsifying and confirming hypotheses:

Predictions enable falsification. Predictions that are incompatible with experimental data allow you to reject hypotheses, or weaken scientific theories or scientific models. Conversely, predictions that are compatible with experimental data and future observations boost the confirmation of theories and models.

Providing a standard for comparison:

The predictive accuracy of models provides a standard for comparing scientific models that are grounded in reality and independent of how the models work.

Checking predictability:

If you can consistently predict a phenomenon based on a set of information, it means that this set contains the relevant information to determine the state of the phenomenon. However, if you cannot, it may be because the available information is insufficient. Knowing what can and cannot be inferred from certain information is immensely important in science

[27]

.

Guiding experiments:

Predictions can tell you which aspect of a phenomenon to focus on and run experiments on. If models or theories make interesting or extraordinary predictions for borderline cases, you can check them by running experiments.

Enabling practical applications of science:

Science must always be seen as embedded in a larger economic and social process. Predictions can be used for planning, making processes more efficient, and automating them. Predictions are a way in which science pays off societal investments.

Scientific Hypothesis

Scientific hypotheses are statements about the world that can be tested experimentally and be falsified. These statements can concern predictions or explanations of phenomena. General hypotheses that have been rigorously tested can become scientific theories. The origin of disease in viruses was a hypothesis before it turned into a theory after it had been rigorously tested. In some contexts, entire models are referred to as hypotheses, meaning that the model is the true or best description of the phenomenon within a given class of models.

3.5 Scientific models may interpolate and extrapolate

Weather forecasting and Einstein’s relativity theory are great examples of two different modes of prediction:

Interpolation

describes the prediction of known data that has been part of the model construction or data highly similar to it.

5

Given a reasonable notion of similarity, interpolating works by a simple principle – treat similar data similarly. Data for which you know the true state forms the basis for predicting similar data for which you don’t. For instance, in weather forecasting, you might have a track record of past cloud formations and their corresponding amount of precipitation. Interpolation means to predict the precipitation of new clouds solely based on the precipitation of similar past cloud formations.

Extrapolation

is complementary to interpolation. It concerns predicting data for which you have no similar data available. Extrapolation requires more than just treating similar data similarly. For extrapolation, domain-specific background knowledge must be leveraged. The prediction that very heavy stars may collapse into black holes was one consequence of Einstein’s theory of relativity and was first drawn by Karl Schwarzschild. “Observations” of black holes didn’t exist when Einstein developed his theory. Instead, the prediction was entirely driven by theory, not by reference to similar events in the past.

Figure 3.1: For (green), is unknown, but lies within the range of the data (blue). You have to interpolate. For (red), is also unknown, but lies outside of the range of the data. You have to extrapolate. Figure by [28], CC-BY (https://creativecommons.org/licenses/by/4.0/)

Interpolation and extrapolation play central yet different roles in scientific prediction. Successful interpolation is often taken for granted, as it expresses that the model is consistent with the given data. Extrapolation, or as Popper [29] calls it ‘bold hypotheses’, on the other hand, takes a more prominent role:

Successful extrapolation can boost your confidence in the background knowledge that informed the inductive bias you equipped your model with.

It can spark the belief that the prediction model captured relevant aspects of reality

[30]

.

Also, extrapolation can guide the experimenting process to try to falsify surprising predictions.

3.6 Machine learning is the ultimate interpolation tool

Supervised machine learning’s main strength is the interpolation type of prediction. Supervised machine learning is strongly data-driven rather than relying on background knowledge to predict unknown cases. Extrapolation is therefore not considered a strength of complex machine learning models such as deep neural networks [31], [32], [33]. Some see the current rise of adversarial examples6 as a direct consequence of such failures of extrapolation [34]. But things are not as simple, for these three reasons:

In machine learning, it is possible to use background knowledge to improve extrapolation, as we discuss in

Chapter 8

.

Interpolation and extrapolation can’t be separated as cleanly as

Figure 3.1

seems to suggest. The separation depends on the similarity of available and new data, and there is no magic cut-off value where a prediction switches from interpolation to extrapolation. A common formal definition of interpolation is the convex hull of the data, however, this would deem almost every prediction in high-dimensional spaces an extrapolation

[35]