117,99 €
This book is about constructing models from experimental data. It covers a range of topics, from statistical data prediction to Kalman filtering, from black-box model identification to parameter estimation, from spectral analysis to predictive control. Written for graduate students, this textbook offers an approach that has proven successful throughout the many years during which its author has taught these topics at his University. The book: * Contains accessible methods explained step-by-step in simple terms * Offers an essential tool useful in a variety of fields, especially engineering, statistics, and mathematics * Includes an overview on random variables and stationary processes, as well as an introduction to discrete time models and matrix analysis * Incorporates historical commentaries to put into perspective the developments that have brought the discipline to its current state * Provides many examples and solved problems to complement the presentation and facilitate comprehension of the techniques presented
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 486
Veröffentlichungsjahr: 2019
Cover
Introduction
Acknowledgments
1 Stationary Processes and Time Series
1.1 Introduction
1.2 The Prediction Problem
1.3 Random Variable
1.4 Random Vector
1.5 Stationary Process
1.6 White Process
1.7 MA Process
1.8 AR Process
1.9 Yule–Walker Equations
1.10 ARMA Process
1.11 Spectrum of a Stationary Process
1.12 ARMA Model: Stability Test and Variance Computation
1.13 Fundamental Theorem of Spectral Analysis
1.14 Spectrum Drawing
1.15 Proof of the Fundamental Theorem of Spectral Analysis
1.16 Representations of a Stationary Process
2 Estimation of Process Characteristics
2.1 Introduction
2.2 General Properties of the Covariance Function
2.3 Covariance Function of ARMA Processes
2.4 Estimation of the Mean
2.5 Estimation of the Covariance Function
2.6 Estimation of the Spectrum
2.7 Whiteness Test
3 Prediction
3.1 Introduction
3.2 Fake Predictor
3.3 Spectral Factorization
3.4 Whitening Filter
3.5 Optimal Predictor from Data
3.6 Prediction of an ARMA Process
3.7 ARMAX Process
3.8 Prediction of an ARMAX Process
4 Model Identification
4.1 Introduction
4.2 Setting the Identification Problem
4.3 Static Modeling
4.4 Dynamic Modeling
4.5 External Representation Models
4.6 Internal Representation Models
4.7 The Model Identification Process
4.8 The Predictive Approach
4.9 Models in Predictive Form
5 Identification of Input–Output Models
5.1 Introduction
5.2 Estimating AR and ARX Models: The Least Squares Method
5.3 Identifiability
5.4 Estimating ARMA and ARMAX Models
5.5 Asymptotic Analysis
5.6 Recursive Identification
5.7 Robustness of Identification Methods
5.8 Parameter Tracking
6 Model Complexity Selection
6.1 Introduction
6.2 Cross‐validation
6.3 FPE Criterion
6.4 AIC Criterion
6.5 MDL Criterion
6.6 Durbin–Levinson Algorithm
7 Identification of State Space Models
7.1 Introduction
7.2 Hankel Matrix
7.3 Order Determination
7.4 Determination of Matrices
and
7.5 Determination of Matrix
7.6 Mid Summary: An Ideal Procedure
7.7 Order Determination with SVD
7.8 Reliable Identification of a State Space Model
8 Predictive Control
8.1 Introduction
8.2 Minimum Variance Control
8.3 Generalized Minimum Variance Control
8.4 Model‐Based Predictive Control
8.5 Data‐Driven Control Synthesis
9 Kalman Filtering and Prediction
9.1 Introduction
9.2 Kalman Approach to Prediction and Filtering Problems
9.3 The Bayes Estimation Problem
9.4 One‐step‐ahead Kalman Predictor
9.5 Multistep Optimal Predictor
9.6 Optimal Filter
9.7 Steady‐State Predictor
9.8 Innovation Representation
9.9 Innovation Representation Versus Canonical Representation
9.10 K‐Theory Versus K–W Theory
9.11 Extended Kalman Filter – EKF
9.12 The Robust Approach to Filtering
10 Parameter Identification in a Given Model
10.1 Introduction
10.2 Kalman Filter‐Based Approaches
10.3 Two‐Stage Method
11 Case Studies
11.1 Introduction
11.2 Kobe Earthquake Data Analysis
11.3 Estimation of a Sinusoid in Noise
Appendix A: Linear Dynamical Systems
A.1 State Space and Input–Output Models
A.2 Lagrange Formula
A.3 Stability
A.4 Impulse Response
A.5 Frequency Response
A.6 Multiplicity of State Space Models
A.7 Reachability and Observability
A.8 System Decomposition
A.9 Stabilizability and Detectability
Appendix B: Matrices
B.1 Basics
B.2 Eigenvalues
B.3 Determinant and Inverse
B.4 Rank
B.5 Annihilating Polynomial
B.6 Algebraic and Geometric Multiplicity
B.7 Range and Null Space
B.8 Quadratic Forms
B.9 Derivative of a Scalar Function with Respect to a Vector
B.10 Matrix Diagonalization via Similarity
B.11 Matrix Diagonalization via Singular Value Decomposition
B.12 Matrix Norm and Condition Number
Appendix C: Problems and Solutions
Bibliography
Further reading
Index
End User License Agreement
Chapter 5
Table 5.1 Iterative ML identification of an ARMAX(1, 1, 1) model.
Chapter 6
Table 6.1 Identification of model ARX
for data generated by the system of Examp...
Table 6.2 Choice of the optimal order (Example 6.3).
Table 6.3 Choice of the optimal order via the cross‐validation method (Example 6...
Chapter 10
Table 10.1 The simulated data chart as the starting point of the two‐stage metho...
Table 10.2 The compressed artificial data chart.
Chapter 1
Figure 1.1 Possible diagrams of the prediction error.
Figure 1.2 Interpreting a sequence of data as the output of a dynamic model fed...
Figure 1.3 Stability region for polynomial
.
Figure 1.4 Stability region for polynomial
.
Figure 1.5 The vector
.
Figure 1.6 MA(1) process features with
(left) and
(right). (a) Covariance f...
Figure 1.7 Complex spectrum in a 3D representation – Example 1.11.
Figure 1.8 The real spectrum
– Example 1.11.
Chapter 2
Figure 2.1 Spectrum and periodogram of the AR(3) process of Example 2.3.
Figure 2.2 Spectrum estimate for Example 2.3: (a) the periodogram, (b)–(d) the ...
Figure 2.3 Whiteness test – determining
from
in the standard unit variance ...
Chapter 3
Figure 3.1 Optimal predictor from
.
Figure 3.2 Effect of an all‐pass filter.
Figure 3.3 Canonical representation (a) and whitening filter (b).
Figure 3.4 (a) Canonical representation. (b) Fake optimal predictor. (c) Optima...
Figure 3.5 Process MA(1) (continuous line) and its prediction (dotted line) wit...
Figure 3.6 Process MA(1) (continuous line) and its prediction (dotted line) obt...
Figure 3.7 Block diagram of the ARMAX model (3.16).
Chapter 4
Figure 4.1 James Clerk Maxwell.
Figure 4.2 Performance
for
.
invertible (a);
singular (b).
Figure 4.3 Hubble law: The recession velocity of galaxies is proportional to th...
Figure 4.4 Modeling a time series (a) or a cause–effect system (b).
Figure 4.5 Block scheme of the ARX model (4.7).
Figure 4.6 The prediction error identification rationale.
Chapter 5
Figure 5.1 Newton method.
Figure 5.2 Newton method – zoom.
Figure 5.3 Newton method in an another point of curve
(a) and the correspondi...
Figure 5.4 Data filtering for the estimation of ARMAX models.
Figure 5.5 Performance index convergence.
Figure 5.6 Asymptotic behavior of prediction error identification methods.
Figure 5.7 Estimate of parameter
in Example 5.12: (a) standard RLS algorithm;...
Figure 5.8 The estimate of parameter
in Example 5.13 exhibits a bursting phen...
Chapter 7
Figure 7.1 Approximating the Hankel matrix.
Figure 7.2 Impulse response of the system of Example 7.5.
Figure 7.3 Singular values of the Hankel matrix of Example 7.5.
Chapter 8
Figure 8.1 Minimum variance control system.
Figure 8.2 A typical feedback control system.
Figure 8.3 Generalized output signal.
Figure 8.4 Generalized minimum variance control system.
Figure 8.5 Model predictive control essentials.
Figure 8.6 A feedback system with parametrized controller.
Chapter 9
Figure 9.1 R.E. Kalman, picture taken by Sergio Bittanti in Bologna, Italy, du...
Figure 9.2 Geometric interpretation of the Bayes formula.
Figure 9.3 Probability and geometry – table of correspondences.
Figure 9.4 Geometric interpretation of the recursive Bayes formula.
Figure 9.5 Innovation
and state prediction error
for a sequence of data,
.
Figure 9.6 Count Jacopo Francesco Riccati.
Figure 9.7 Kalman predictor block scheme.
Figure 9.8 Graphical determination of the solutions of the ARE – Example 9.5.
Figure 9.9 Graphical determination of the solutions of the DRE – Example 9.5 wi...
Figure 9.10 Graphical determination of the solutions of the ARE – Example 9.5 w...
Figure 9.11 Canonical decomposition of system 9.70.
Chapter 10
Figure 10.1 The parameter estimation problem.
Figure 10.2 Estimates of the unknown parameter with the EKF method. Example 10....
Figure 10.3 Estimates of the unknown parameter with the EKF method. Example 10....
Figure 10.4 Estimates of the unknown parameter of system (10.1) with the two‐st...
Chapter 11
Figure 11.1 The time series
of Kobe earthquake.
Figure 11.2 Partition of the time series
into three segments.
Figure 11.3 Nonparametric properties of
over the normal seismic activity segm...
Figure 11.4
for
and
.
Figure 11.5 Spectrum and poles and zeros of the identified stochastic model.
Figure 11.6 Prediction error and Anderson's whiteness test.
Figure 11.7 The earthquake phase segment (a) along with the corresponding perio...
Figure 11.8 Whiteness test for the earthquake phase.
Figure 11.9 The transition phase segment and its further partition.
Figure 11.10 Whiteness test for each time window in the transition segment.
Figure 11.11 Poles and zeros of a notch filter.
Figure 11.12 Frequency response of notch filter.
Figure 11.13 Tracking performance of the notch filter: (a)
and (b)
.
Cover
Table of Contents
Begin Reading
iv
xi
xii
xiii
xv
xvi
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
393
396
397
398
399
E1
Sergio Bittanti
Politecnico di Milano
Milan, Italy
This edition first published 2019
© 2019 John Wiley & Sons, Inc.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.
The right of Sergio Bittanti to be identified as the author of this work has been asserted in accordance with law.
Registered Office
John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA
Editorial Office
111 River Street, Hoboken, NJ 07030, USA
For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.
Wiley also publishes its books in a variety of electronic formats and by print‐on‐demand. Some content that appears in standard print versions of this book may not be available in other formats.
Limit of Liability/Disclaimer of Warranty
While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
Library of Congress Cataloging‐in‐Publication Data
Names: Bittanti, Sergio, author.
Title: Model identification and data analysis / Sergio Bittanti, Politecnico
di Milano, Milan, Italy.
Description: Hoboken, NJ, USA : Wiley, [2019] | Includes bibliographical
references and index. |
Identifiers: LCCN 2018046965 (print) | LCCN 2018047956 (ebook) | ISBN
9781119546412 (Adobe PDF) | ISBN 9781119546313 (ePub) | ISBN 9781119546368
(hardcover)
Subjects: LCSH: Mathematical models. | Quantitative research. | System
identification.
Classification: LCC TA342 (ebook) | LCC TA342 .B58 2019 (print) | DDC
511/.8–dc23
LC record available at https://lccn.loc.gov/2018046965
Cover design: Wiley
Cover image: © Oleksii Lishchyshyn/Shutterstock, © gremlin/Getty Images
Today, a deluge of information is available in a variety of formats. Industrial plants are equipped with distributed sensors and smart metering; huge data repositories are preserved in public and private institutions; computer networks spread bits in any corner of the world at unexpected speed. No doubt, we live in the age of data.
This new scenario in the history of humanity has made it possible to use new paradigms to deal with old problems and, at the same time, has led to challenging questions never addressed before. To reveal the information content hidden in observations, models have to be constructed and analyzed.
The purpose of this book is to present the first principles of model construction from data in a simple form, so as to make the treatment accessible to a wide audience. As R.E. Kalman (1930–2016) used to say “Let the data speak,” this is precisely our objective.
Our path is organized as follows.
We begin by studying signals with stationary characteristics (Chapter 1). After a brief presentation of the basic notions of random variable and random vector, we come to the definition of white noise, a peculiar process through which one can construct a fairly general family of models suitable for describing random signals. Then we move on to the realm of frequency domain by introducing a spectral characterization of data. The final goal of this chapter is to identify a wise representation of a stationary process suitable for developing prediction theory.
In our presentation of random notions, we rely on elementary concepts: the mean, the covariance function, and the spectrum, without any assumption about the probability distribution of data. In Chapter 2, we briefly see how these features can be computed from data.
For the simple dynamic models introduced in Chapter 1, we present the corresponding prediction theory. Given the model, this theory, explained in Chapter 3, enables one to determine the predictor with elementary computations. Having been mainly developed by Andrey N. Kolmogorov and Norbert Wiener, we shall refer it as Kolmogorov–Wiener theory or simply K–W theory.
Then, in Chapter 4, we start studying the techniques for the construction of a model from data. This transcription of long sequences of apparently confusing numbers into a concise formula that can be scribbled into our notebook is the essence and the magic of identification science.
The methods for the parameter estimation of input–output models are the subject of Chapter 5. The features of the identified models when the number of snapshots tends to infinity is also investigated (asymptotic analysis). Next, the recursive version of the various methods, suitable for real‐time implementation, are introduced.
In system modeling, one of the major topics that has attracted the attention of many scholars from different disciplines is the selection of the appropriate complexity. Here, the problem is that an overcomplex model, while offering better data fitting, may also fit the noise affecting measurements. So, one has to find a trade‐off between accuracy and complexity. This is discussed in Chapter 6.
Considering that prediction theory is model‐based, our readers might conclude that the identification methods should be presented prior to the prediction methods. The reason why we have done the opposite is that the concept of prediction is very much used in identification.
In Chapter 7, the problem of identifying a model in a state space form is dealt with. Here, the data are organized into certain arrays from the factorization of which the system matrices are eventually identified.
The use of the identified models for control is concisely outlined in Chapter 8. Again prediction is at the core of such techniques, since their basic principle is to ensure that the prediction supplied by the model is close to the desired target. This is why these techniques are known as predictive control methods.
Chapter 9 is devoted to Kalman theory (or simply K theory) for filtering and prediction. Here, the problem is to estimate the temporal evolution of the state of a system. In other words, instead of parameter estimation, we deal with signal estimation. A typical situation where such a problem is encountered is deep space navigation, where the position of a spacecraft has to be found in real time from available observations.
At the end of this chapter, we compare the two prediction theories introduced in the book, namely we compare K theory with K–W theory of Chapter 3.
We pass then to Chapter 10, where the problem of the estimation of an unknown parameter in a given model is treated.
Identification methods have had and continue to have a huge number of applications, in engineering, physics, biology, and economics, to mention only the main disciplines. To illustrate their applicability, a couple of case studies are discussed in Chapter 11. One of them deals with the analysis of Kobe earthquake of 1995. In this study, most facets of the estimation procedure of input–output models are involved, including parameter identification and model complexity selection. Second, we consider the problem of estimating the unknown frequency of a periodic signal corrupted by noise, by resorting to the input–output approach as well as with the state space approach by nonlinear Kalman techniques.
There are, moreover, many numerical examples to accompany and complement the presentation and development of the various methods.
In this book, we focus on the discrete time case. The basic pillars on which we rely are random notions, dynamic systems, and matrix theory.
Random variables and stationary processes are gradually introduced in the first sections of the initial Chapter 1. As already said, our concise treatment hinges on simple notions, culminating in the concept of white process, the elementary brick for the construction of the class of models we deal with. Going through these pages, the readers will become progressively familiar with stationary processes and ideas, as tools for the description of uncertain data.
The main concepts concerning linear discrete‐time dynamical systems are outlined in Appendix A. They range from state space to transfer functions, including their interplay via realization theory.
In Appendix B, the readers who are not familiar with matrix analysis will find a comprehensive overview not only of eigenvalues and eigenvectors, determinant and basis, but also of the notion of rank and the basic tool for its practical determination, singular value decomposition.
Finally, a set of problems with their solution is proposed in Appendix C.
Most simulations presented in this volume have been performed with the aid of MATLAB® package, see https://it.mathworks.com/help/ident/.
The single guiding principle in writing this books has been to introduce and explain the subject to readers as clearly as possible.
The birth of a new book is an emotional moment, especially when it comes after years of research and teaching.
This text is indeed the outcome of my years of lecturing model identification and data analysis (MIDA) at the Politecnico di Milano, Italy. In its first years of existence, the course had a very limited number of students. Nowadays, there are various MIDA courses, offered to master students of automation and control engineering, electronic engineering, bio‐engineering, computer engineering, aerospace engineering, and mathematical engineering.
In my decades of scientific activity, I have had the privilege of meeting and working with many scholars. Among them, focusing on the Italian community, are Paolo Bolzern, Claudio Bonivento, Marco Claudio Campi, Patrizio Colaneri, Antonio De Marco, Giuseppe De Nicolao, Marcello Farina, Simone Formentin, Giorgio Fronza, Simone Garatti, Roberto Guidorzi, Alberto Isidori, Antonio Lepschy, Diego Liberati, Arturo Locatelli, Marco Lovera, Claudio Maffezzoni, Gianantonio Magnani, Edoardo Mosca, Giorgio Picci, Luigi Piroddi, Maria Prandini, Fabio Previdi, Paolo Rocco, Sergio Matteo Savaresi, Riccardo Scattolini, Nicola Schiavoni, Silvia Carla Strada, Mara Tanelli, Roberto Tempo, and Antonio Vicino.
I am greatly indebted to Silvia Maria Canevese for her generous help in the manuscript editing, thank you Silvia. Joshua Burkholder, Luigi Folcini, Chiara Pasqualini, Grace Paulin Jeeva S, Marco Rapizza, Matteo Zovadelli, and Fausto Vezzaro also helped out with the editing in various phases of the work.
I also express my gratitude to Guido Guardabassi for all our exchanges of ideas on this or that topic and for his encouragement to move toward the subject of data analysis in my early university days.
Some of these persons, as well as other colleagues from around the world, are featured in the picture at the end of the book (taken at a workshop held in 2017 at Lake Como, Italy).
A last note of thanks goes to the multitude of students I met over the years in my class. Their interest has been an irreplaceable stimulus for my never ending struggle to explain the subject as clearly and intelligibly as possible.
Sergio Bittanti
e‐mail: [email protected]
website: home.deib.polimi.it/bittanti/
The support of the Politecnico di Milano and the National Research Council of Italy (Consiglio Nazionale delle Ricerche–CNR) is gratefully acknowledged.
Forecasting the evolution of a man‐made system or a natural phenomenon is one of the most ancient problems of human kind. We develop here a prediction theory under the assumption that the variable under study can be considered as stationary process. The theory is easy to understand and simple to apply. Moreover, it lends itself to various generalizations, enabling to deal with nonstationary signals.
The organization is as follows. After an introduction to the prediction problem (Section 1.2), we concisely review the notions of random variable, random vector, and random (or stochastic) process in Sections 1.3–1.5, respectively. This leads to the definition of white process (Section 1.6), a key notion in the subsequent developments. The readers who are familiar with random concepts can skip Sections 1.3–1.5.
Then we introduce the moving average (MA) process and the autoregressive (AR) process (Sections 1.7 and 1.8). By combining them, we come to the family of autoregressive and moving average (ARMA) processes (Section 1.10). This is the family of stationary processes we focus on in this volume.
For such processes, in Chapter 3, we develop a prediction theory, thanks to which we can easily work out the optimal forecast given the model.
In our presentation, we make use of elementary concepts of linear dynamical systems such as transfer functions, poles, and zeros; the readers who are not familiar with such topics are cordially invited to first study Appendix A.
Consider a real variable depending on discrete time . The variable is observed over the interval . The problem is to predict the value that will take the subsequent sample .
Various prediction rules may be conceived, providing a guess for based on . A generic predictor is denoted with the symbol :
The question is how to choose function .
A possibility is to consider only a bunch of recent data, say , , , , and to construct the prediction as a linear combination of them with real coefficients , , …, :
The problem then becomes that of selecting the integer and the most appropriate values for parameter , , …, .
Suppose for a moment that and were selected. Then the prediction rule is fully specified and it can be applied to the past time points for which data are available to evaluate the prediction error:
Let's now consider this fundamental question: Which characteristics should the prediction error exhibit in order to conclude that we have constructed a “good predictor”? In principle, the best one can hope for is that the prediction error be null at any time point. However, in practice, this is Utopian. Hence, we have to investigate the properties that a non‐null should exhibit in order to conclude that the prediction is fair.
For the sake of illustration, consider the case when has the time evolution shown in Figure 1.1a. As can be seen, the mean value of is nonzero. Correspondingly, the rule
would be better than the original one. Indeed, with the new rule of prediction, one can get rid of the systematic error.
Figure 1.1 Possible diagrams of the prediction error.
As a second option, consider the case when the prediction error is given by the diagram of Figure 1.1b. Then the mean value is zero. However, the sign of changes at each instant; precisely, for even and for odd. Hence, even in such a case, a better prediction rule than the initial one can be conceived. Indeed, one can formulate the new rule:
and
From these simple remarks, one can conclude that the best predictor should have the following property: besides a zero mean value, the prediction error should have no regularity, rather it should be fully unpredictable. In this way, the model captures the whole dynamic hidden in data and no useful information remains unveiled in the residual error, and no better predictor can be conceived. The intuitive concept of “unpredictable signal” has been formalized in the twentieth century, leading to the notion of white noise (WN) or white process, a concept we precisely introduce later in this chapter. For the moment, it is important to bear in mind the following conclusion: A prediction rule is appropriate if the corresponding prediction error is a white process.
In this connection, we make the following interesting observation. Assume that is indeed a white noise, then
Rewrite this difference equation by means of the delay operator , namely the operator such that
Then
from which
or
with
By reinterpreting as the complex variable, this relationship becomes the expression of a dynamical system with transfer function (from to ) given by .
Summing up, finding a good predictor is equivalent to determining a model supplying the given sequence of data as the output of a dynamical system fed by white noise (Figure 1.2).
Figure 1.2 Interpreting a sequence of data as the output of a dynamic model fed by white noise.
This is why studying dynamical systems having a white noise at the input is a main preliminary step toward the study of prediction theory.
The road we follow toward this objective relies first on the definition of white noise, which we pursue in four stages: random variable random vector stochastic process white noise.
A random (or stochastic) variable is a real variable that depends upon the outcome of a random experiment. For example, the variable taking the value or depending on the result of the tossing of a coin is a random variable.
The outcome of the random experiment is denoted by ; hence, a random variable is a function of : .
For our purposes, a random variable is described by means of its mean value (or expected value) and its variance, which we will denote by and , respectively.
The mean value is the real number around which the values taken by the variable fluctuate. Note that, given two random variables, and with mean values and , the random variable
obtained as a linear combination of and via the real numbers and , has a mean value:
The variance captures the intensity of fluctuations around the mean value. To be precise, it is defined as
where denotes the mean value of . Obviously, being non‐negative, the variance is a real non‐negative number.
Often, the variance is denoted with symbols such as or . When one deals with various random variables, the variance of the th variable may be denoted as or .
The square root of the variance is called standard deviation, denoted by or . If the random variable has a Gaussian distribution, then the mean value and the variance define completely the probability distribution of the variable. In particular, if a random variable is Gaussian, the probability that it takes value in the interval and is about . So if is Gaussian with mean value 10 and variance 100, then, in cases, the values taken by range from to .
A random (or stochastic) vector is a vector whose elements are random variables. We focus for simplicity on the bi‐dimensional case, namely, given two random variables and ,
is a random vector (of dimension 2). The mean value of a random vector is defined as the vector of real numbers constituted by the mean values of the elements of the vector. Thus,
where and are the mean values of and , respectively. The variance is a matrix given by
where
Here, besides variances and of the single random variables, the so‐called “cross‐variance” between and , , and “cross‐variance” between and , , appear. Obviously, , so that is a symmetric matrix.
It is easy to verify that the variance matrix can also be written in the form
where ′ denotes transpose.
In general, for a vector of any dimension, the variance matrix is given by
where is the vector whose elements are the mean values of the random variables entering .
If is a vector with entries, is a matrix. In any case, is a symmetric matrix having the variances of the single variables composing vector along the diagonal and all cross‐variances as off‐diagonal terms.
A remarkable feature of a variance matrix is that it is a positive semi‐definite matrix.
The notions of positive semi‐definite and positive definite matrix are explained in Appendix B. In a very concise way, given a real symmetric matrix , associate to it the scalar function defined as , where is an ‐dimensional real vector. For example, if
we take
Then
Hence, is quadratic in the entries of vector . Matrix is said to be
positive semi‐definite if
,
positive definite if it is positive semi‐definite and
only for
We write and to denote a positive semi‐definite and a positive definite matrix, respectively.
We can now verify that, for any random vector , is positive semi‐definite. Indeed, consider
Then
Here, we have used the property . Observe now that , being the product of a row vector times a column vector, is a scalar. As such, it coincides with its transpose: . Therefore,
This is the expected value of a square, namely a non‐negative real number. Therefore, this quantity is non‐negative for any . Hence, we come to the conclusion that any variance matrix is positive semi‐definite. We simply write
Among the remarkable properties of positive semi‐definite matrices, there is the fact that their determinant is non‐negative (see Appendix B). Hence, referring to the two‐dimensional case,
Under the assumption that and , this inequality suggests to define
is known as covariance coefficient between random variables and . When and have zero mean value, is also known as correlation coefficient. The previous inequality on the determinant of the variance matrix can be restated as follows:
One says that and are uncorrelated when . If instead or , one says that they have maximal correlation.
Given a random variable , with , consider the variable
where is a real number. To determine the covariance coefficient between and , we compute the mean value and the variance of as well as the cross‐covariance . The mean value of is
Its variance is easily computed as follows:
As for the cross‐variance, we have
Therefore,
Finally, if , then . In conclusion,
In particular, we see that, if , the correlation is maximal in absolute value. This is expected, since, being , knowing the value taken by , one can evaluate without any error.
A random or stochastic process is a sequence of random variables ordered with an index , referred to as time. We consider as a discrete index (). The random variable associated with time is denoted by . It is advisable to recall that a random variable is not a number, it is a real function of the outcome of a random experiment . In other words,
Thus, a stochastic process is an infinite sequence of real variables, each of which depends upon two variables, time and outcome . Often, for simplicity in notation, the dependence upon is omitted and one simply writes to denote the process. However, one should always keep in mind that depends also upon the outcome of an underlying random experiment, .
Once a particular outcome is fixed, the set defines a real function of time : . Such function is named process realization. To each outcome, a realization is associated. Hence, the set of realizations is the set of possible signals that the process can exhibit depending on the specific outcome of a random experiment. If, in the opposite, time is fixed, then one obtains , the random variable at time extracted from the process.
Consider the following process: Toss a coin, if the outcome is heads, then we associate to it the function , if the outcome is tails, we associate the function . The random process so defined has two sinusoidal signals as realizations, and . At a given time point , the process is a random variable , which can take two values, and .
The simplest way to describe a stochastic process is to specify its mean function and its covariance function.
Mean function
:
The mean function is defined as
Operator performs the average over all possible outcomes of the underlying random experiment. Hence, we also write
In such averaging, is a fixed parameter. Therefore, does not depend upon anymore; it depends on only. is the function of time around which the samples of the random variable fluctuate.
Variance function
:
The variance function of the process is
It provides the variances of the random variables at each time point.
Covariance function
:
The covariance function captures the mutual dependence of two random variables extracted from the process at different time points, say at times and . It is defined as
It characterizes the interdependence between the deviation of around its mean and the deviation of around its mean value . Note that, if we consider the same function with exchanged indexes, i.e. , we have
Since , it follows that
Furthermore, by setting we obtain
This is the variance of random variable . Hence, when the two time indexes coincide, the covariance function supplies the process variance at the given time point.
We are now in a position to introduce the concept of stationary process.
A stochastic process is said to be stationary when
is constant,
is constant,
depends upon
only.
Therefore, the mean value of a stationary process is simply indicated as
and the covariance function can be denoted with the symbol , where :
Note that, for , from this expression, we have . In other words, is the variance of the process.
Summing up, a stationary stochastic process is described by its mean value (a real number) and its covariance function (a real function). The variance of the process is implicitly given by the covariance function at .
We now review the main properties of the covariance function of a stationary process.
.
Indeed, is a variance.
.
This is a consequence of 1.1 (taken and ).
.
Indeed, consider any pair of random variables drawn for the process, say and , with different time points. The covariance coefficient between such variables is
On the other hand, we know that , so that cannot oversize .
This last property suggests the definition of the normalized covariance function as
Obviously, , while , . Note that, for , and may be both positive or negative.
Further properties of the covariance function are discussed in Section 2.2.
A white process is defined as the stationary stochastic process having the following covariance function:
This means that, if we take any pair of time points and with , the deviations of and from the process mean value are uncorrelated whatever and be. Thus, the knowledge of the value of the process at time is of no use to predict the value of the process at time , . The only prediction that can be formulated is the trivial one, the mean value. This is why the white process is a way to formalize the concept of fully unpredictable signal.
The white process is also named white noise (WN).
We will often use the compact notation
to mean that is a white process with
,
,
.
The white noise is the basic brick to construct the family of stationary stochastic processes that we work with.
An MA process is a stochastic process generated as a linear combination of the current and past values of a white process :
where are real numbers.
We now determine the main features of . We start with the computation of the mean value and the variance of . As for the mean,
Since it follows that
Passing to the variance, we have
being white, all mean values of the cross‐products of the type with are equal to zero. Hence,
Turn now to the covariance function . First, we consider the case when , and for simplicity, we set and . Then
It is easy to see that the same conclusion holds true if , so that
Analogous computations can be performed for , obtaining
We see that and do not depend on time .
In general, we come to the conclusion that does not depend upon and separately; it depends upon only. Precisely,
Summing up, any MA process has
constant mean value,
constant variance,
covariance function depending upon the distance between the two considered time points.
Therefore, it is a stationary process, whatever values parameters , may take.
Observe that the expression of an MA process
can be restated by means of the delay operator as
Then by introducing the operator
one can write
From this expression, the transfer function from to can be worked out
Note that this transfer function has poles in the origin of the complex plane, whereas the zeros, the roots of polynomial , may be located in various positions, depending on the values of the parameters.
We extrapolate the above notion of MA() process and consider the MA() case too:
Of course, this definition requires some caution, as in any series of infinite terms. If the white process has zero mean value, then also has a zero mean value. The variance can be obtained by extrapolating the expression of the variance for the MA() case, namely,
