E-Book
117,99 €

Model Identification and Data Analysis E-Book

Sergio Bittanti

0,0

117,99 €

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.

Herausgeber: John Wiley & Sons
Kategorie: Wissenschaft und neue Technologien
Sprache: Englisch

Beschreibung

This book is about constructing models from experimental data. It covers a range of topics, from statistical data prediction to Kalman filtering, from black-box model identification to parameter estimation, from spectral analysis to predictive control. Written for graduate students, this textbook offers an approach that has proven successful throughout the many years during which its author has taught these topics at his University. The book: * Contains accessible methods explained step-by-step in simple terms * Offers an essential tool useful in a variety of fields, especially engineering, statistics, and mathematics * Includes an overview on random variables and stationary processes, as well as an introduction to discrete time models and matrix analysis * Incorporates historical commentaries to put into perspective the developments that have brought the discipline to its current state * Provides many examples and solved problems to complement the presentation and facilitate comprehension of the techniques presented

Details

Sie lesen das E-Book in den Legimi-Apps auf:

Android

iOS

von Legimi
zertifizierten E-Readern

Seitenzahl: 486

Veröffentlichungsjahr: 2019

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Leseprobe

Cover

Introduction

Acknowledgments

1 Stationary Processes and Time Series

1.1 Introduction

1.2 The Prediction Problem

1.3 Random Variable

1.4 Random Vector

1.5 Stationary Process

1.6 White Process

1.7 MA Process

1.8 AR Process

1.9 Yule–Walker Equations

1.10 ARMA Process

1.11 Spectrum of a Stationary Process

1.12 ARMA Model: Stability Test and Variance Computation

1.13 Fundamental Theorem of Spectral Analysis

1.14 Spectrum Drawing

1.15 Proof of the Fundamental Theorem of Spectral Analysis

1.16 Representations of a Stationary Process

2 Estimation of Process Characteristics

2.1 Introduction

2.2 General Properties of the Covariance Function

2.3 Covariance Function of ARMA Processes

2.4 Estimation of the Mean

2.5 Estimation of the Covariance Function

2.6 Estimation of the Spectrum

2.7 Whiteness Test

3 Prediction

3.1 Introduction

3.2 Fake Predictor

3.3 Spectral Factorization

3.4 Whitening Filter

3.5 Optimal Predictor from Data

3.6 Prediction of an ARMA Process

3.7 ARMAX Process

3.8 Prediction of an ARMAX Process

4 Model Identification

4.1 Introduction

4.2 Setting the Identification Problem

4.3 Static Modeling

4.4 Dynamic Modeling

4.5 External Representation Models

4.6 Internal Representation Models

4.7 The Model Identification Process

4.8 The Predictive Approach

4.9 Models in Predictive Form

5 Identification of Input–Output Models

5.1 Introduction

5.2 Estimating AR and ARX Models: The Least Squares Method

5.3 Identifiability

5.4 Estimating ARMA and ARMAX Models

5.5 Asymptotic Analysis

5.6 Recursive Identification

5.7 Robustness of Identification Methods

5.8 Parameter Tracking

6 Model Complexity Selection

6.1 Introduction

6.2 Cross‐validation

6.3 FPE Criterion

6.4 AIC Criterion

6.5 MDL Criterion

6.6 Durbin–Levinson Algorithm

7 Identification of State Space Models

7.1 Introduction

7.2 Hankel Matrix

7.3 Order Determination

7.4 Determination of Matrices

and

7.5 Determination of Matrix

7.6 Mid Summary: An Ideal Procedure

7.7 Order Determination with SVD

7.8 Reliable Identification of a State Space Model

8 Predictive Control

8.1 Introduction

8.2 Minimum Variance Control

8.3 Generalized Minimum Variance Control

8.4 Model‐Based Predictive Control

8.5 Data‐Driven Control Synthesis

9 Kalman Filtering and Prediction

9.1 Introduction

9.2 Kalman Approach to Prediction and Filtering Problems

9.3 The Bayes Estimation Problem

9.4 One‐step‐ahead Kalman Predictor

9.5 Multistep Optimal Predictor

9.6 Optimal Filter

9.7 Steady‐State Predictor

9.8 Innovation Representation

9.9 Innovation Representation Versus Canonical Representation

9.10 K‐Theory Versus K–W Theory

9.11 Extended Kalman Filter – EKF

9.12 The Robust Approach to Filtering

10 Parameter Identification in a Given Model

10.1 Introduction

10.2 Kalman Filter‐Based Approaches

10.3 Two‐Stage Method

11 Case Studies

11.1 Introduction

11.2 Kobe Earthquake Data Analysis

11.3 Estimation of a Sinusoid in Noise

Appendix A: Linear Dynamical Systems

A.1 State Space and Input–Output Models

A.2 Lagrange Formula

A.3 Stability

A.4 Impulse Response

A.5 Frequency Response

A.6 Multiplicity of State Space Models

A.7 Reachability and Observability

A.8 System Decomposition

A.9 Stabilizability and Detectability

Appendix B: Matrices

B.1 Basics

B.2 Eigenvalues

B.3 Determinant and Inverse

B.4 Rank

B.5 Annihilating Polynomial

B.6 Algebraic and Geometric Multiplicity

B.7 Range and Null Space

B.8 Quadratic Forms

B.9 Derivative of a Scalar Function with Respect to a Vector

B.10 Matrix Diagonalization via Similarity

B.11 Matrix Diagonalization via Singular Value Decomposition

B.12 Matrix Norm and Condition Number

Appendix C: Problems and Solutions

Bibliography

List of Tables

Chapter 5

Table 5.1 Iterative ML identification of an ARMAX(1, 1, 1) model.

Chapter 6

Table 6.1 Identification of model ARX

for data generated by the system of Examp...

Table 6.2 Choice of the optimal order (Example 6.3).

Table 6.3 Choice of the optimal order via the cross‐validation method (Example 6...

Chapter 10

Table 10.1 The simulated data chart as the starting point of the two‐stage metho...

Table 10.2 The compressed artificial data chart.

List of Illustrations

Chapter 1

Figure 1.1 Possible diagrams of the prediction error.

Figure 1.2 Interpreting a sequence of data as the output of a dynamic model fed...

Figure 1.3 Stability region for polynomial

Figure 1.4 Stability region for polynomial

Figure 1.5 The vector

Figure 1.6 MA(1) process features with

(left) and

(right). (a) Covariance f...

Figure 1.7 Complex spectrum in a 3D representation – Example 1.11.

Figure 1.8 The real spectrum

– Example 1.11.

Chapter 2

Figure 2.1 Spectrum and periodogram of the AR(3) process of Example 2.3.

Figure 2.2 Spectrum estimate for Example 2.3: (a) the periodogram, (b)–(d) the ...

Figure 2.3 Whiteness test – determining

from

in the standard unit variance ...

Chapter 3

Figure 3.1 Optimal predictor from

Figure 3.2 Effect of an all‐pass filter.

Figure 3.3 Canonical representation (a) and whitening filter (b).

Figure 3.4 (a) Canonical representation. (b) Fake optimal predictor. (c) Optima...

Figure 3.5 Process MA(1) (continuous line) and its prediction (dotted line) wit...

Figure 3.6 Process MA(1) (continuous line) and its prediction (dotted line) obt...

Figure 3.7 Block diagram of the ARMAX model (3.16).

Chapter 4

Figure 4.1 James Clerk Maxwell.

Figure 4.2 Performance

for

invertible (a);

singular (b).

Figure 4.3 Hubble law: The recession velocity of galaxies is proportional to th...

Figure 4.4 Modeling a time series (a) or a cause–effect system (b).

Figure 4.5 Block scheme of the ARX model (4.7).

Figure 4.6 The prediction error identification rationale.

Chapter 5

Figure 5.1 Newton method.

Figure 5.2 Newton method – zoom.

Figure 5.3 Newton method in an another point of curve

(a) and the correspondi...

Figure 5.4 Data filtering for the estimation of ARMAX models.

Figure 5.5 Performance index convergence.

Figure 5.6 Asymptotic behavior of prediction error identification methods.

Figure 5.7 Estimate of parameter

in Example 5.12: (a) standard RLS algorithm;...

Figure 5.8 The estimate of parameter

in Example 5.13 exhibits a bursting phen...

Chapter 7

Figure 7.1 Approximating the Hankel matrix.

Figure 7.2 Impulse response of the system of Example 7.5.

Figure 7.3 Singular values of the Hankel matrix of Example 7.5.

Chapter 8

Figure 8.1 Minimum variance control system.

Figure 8.2 A typical feedback control system.

Figure 8.3 Generalized output signal.

Figure 8.4 Generalized minimum variance control system.

Figure 8.5 Model predictive control essentials.

Figure 8.6 A feedback system with parametrized controller.

Chapter 9

Figure 9.1 R.E. Kalman, picture taken by Sergio Bittanti in Bologna, Italy, du...

Figure 9.2 Geometric interpretation of the Bayes formula.

Figure 9.3 Probability and geometry – table of correspondences.

Figure 9.4 Geometric interpretation of the recursive Bayes formula.

Figure 9.5 Innovation

and state prediction error

for a sequence of data,

Figure 9.6 Count Jacopo Francesco Riccati.

Figure 9.7 Kalman predictor block scheme.

Figure 9.8 Graphical determination of the solutions of the ARE – Example 9.5.

Figure 9.9 Graphical determination of the solutions of the DRE – Example 9.5 wi...

Figure 9.10 Graphical determination of the solutions of the ARE – Example 9.5 w...

Figure 9.11 Canonical decomposition of system 9.70.

Chapter 10

Figure 10.1 The parameter estimation problem.

Figure 10.2 Estimates of the unknown parameter with the EKF method. Example 10....

Figure 10.3 Estimates of the unknown parameter with the EKF method. Example 10....

Figure 10.4 Estimates of the unknown parameter of system (10.1) with the two‐st...

Chapter 11

Figure 11.1 The time series

of Kobe earthquake.

Figure 11.2 Partition of the time series

into three segments.

Figure 11.3 Nonparametric properties of

over the normal seismic activity segm...

Figure 11.4

for

and

Figure 11.5 Spectrum and poles and zeros of the identified stochastic model.

Figure 11.6 Prediction error and Anderson's whiteness test.

Figure 11.7 The earthquake phase segment (a) along with the corresponding perio...

Figure 11.8 Whiteness test for the earthquake phase.

Figure 11.9 The transition phase segment and its further partition.

Figure 11.10 Whiteness test for each time window in the transition segment.

Figure 11.11 Poles and zeros of a notch filter.

Figure 11.12 Frequency response of notch filter.

Figure 11.13 Tracking performance of the notch filter: (a)

and (b)

Guide

Cover

Table of Contents

Begin Reading

Pages

xii

xiii

xvi

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

393

396

397

398

399

Model Identification and Data Analysis

Sergio Bittanti

Politecnico di Milano

Milan, Italy

Copyright

This edition first published 2019

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.

The right of Sergio Bittanti to be identified as the author of this work has been asserted in accordance with law.

Registered Office

John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA

Editorial Office

111 River Street, Hoboken, NJ 07030, USA

For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.

Wiley also publishes its books in a variety of electronic formats and by print‐on‐demand. Some content that appears in standard print versions of this book may not be available in other formats.

Limit of Liability/Disclaimer of Warranty

While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

Library of Congress Cataloging‐in‐Publication Data

Names: Bittanti, Sergio, author.

Title: Model identification and data analysis / Sergio Bittanti, Politecnico

di Milano, Milan, Italy.

Description: Hoboken, NJ, USA : Wiley, [2019] | Includes bibliographical

references and index. |

Identifiers: LCCN 2018046965 (print) | LCCN 2018047956 (ebook) | ISBN

9781119546412 (Adobe PDF) | ISBN 9781119546313 (ePub) | ISBN 9781119546368

(hardcover)

Subjects: LCSH: Mathematical models. | Quantitative research. | System

identification.

Classification: LCC TA342 (ebook) | LCC TA342 .B58 2019 (print) | DDC

511/.8–dc23

LC record available at https://lccn.loc.gov/2018046965

Cover design: Wiley

Introduction

Today, a deluge of information is available in a variety of formats. Industrial plants are equipped with distributed sensors and smart metering; huge data repositories are preserved in public and private institutions; computer networks spread bits in any corner of the world at unexpected speed. No doubt, we live in the age of data.

This new scenario in the history of humanity has made it possible to use new paradigms to deal with old problems and, at the same time, has led to challenging questions never addressed before. To reveal the information content hidden in observations, models have to be constructed and analyzed.

The purpose of this book is to present the first principles of model construction from data in a simple form, so as to make the treatment accessible to a wide audience. As R.E. Kalman (1930–2016) used to say “Let the data speak,” this is precisely our objective.

Our path is organized as follows.

We begin by studying signals with stationary characteristics (Chapter 1). After a brief presentation of the basic notions of random variable and random vector, we come to the definition of white noise, a peculiar process through which one can construct a fairly general family of models suitable for describing random signals. Then we move on to the realm of frequency domain by introducing a spectral characterization of data. The final goal of this chapter is to identify a wise representation of a stationary process suitable for developing prediction theory.

In our presentation of random notions, we rely on elementary concepts: the mean, the covariance function, and the spectrum, without any assumption about the probability distribution of data. In Chapter 2, we briefly see how these features can be computed from data.

For the simple dynamic models introduced in Chapter 1, we present the corresponding prediction theory. Given the model, this theory, explained in Chapter 3, enables one to determine the predictor with elementary computations. Having been mainly developed by Andrey N. Kolmogorov and Norbert Wiener, we shall refer it as Kolmogorov–Wiener theory or simply K–W theory.

Then, in Chapter 4, we start studying the techniques for the construction of a model from data. This transcription of long sequences of apparently confusing numbers into a concise formula that can be scribbled into our notebook is the essence and the magic of identification science.

The methods for the parameter estimation of input–output models are the subject of Chapter 5. The features of the identified models when the number of snapshots tends to infinity is also investigated (asymptotic analysis). Next, the recursive version of the various methods, suitable for real‐time implementation, are introduced.

In system modeling, one of the major topics that has attracted the attention of many scholars from different disciplines is the selection of the appropriate complexity. Here, the problem is that an overcomplex model, while offering better data fitting, may also fit the noise affecting measurements. So, one has to find a trade‐off between accuracy and complexity. This is discussed in Chapter 6.

Considering that prediction theory is model‐based, our readers might conclude that the identification methods should be presented prior to the prediction methods. The reason why we have done the opposite is that the concept of prediction is very much used in identification.

In Chapter 7, the problem of identifying a model in a state space form is dealt with. Here, the data are organized into certain arrays from the factorization of which the system matrices are eventually identified.

The use of the identified models for control is concisely outlined in Chapter 8. Again prediction is at the core of such techniques, since their basic principle is to ensure that the prediction supplied by the model is close to the desired target. This is why these techniques are known as predictive control methods.

Chapter 9 is devoted to Kalman theory (or simply K theory) for filtering and prediction. Here, the problem is to estimate the temporal evolution of the state of a system. In other words, instead of parameter estimation, we deal with signal estimation. A typical situation where such a problem is encountered is deep space navigation, where the position of a spacecraft has to be found in real time from available observations.

At the end of this chapter, we compare the two prediction theories introduced in the book, namely we compare K theory with K–W theory of Chapter 3.

We pass then to Chapter 10, where the problem of the estimation of an unknown parameter in a given model is treated.

Identification methods have had and continue to have a huge number of applications, in engineering, physics, biology, and economics, to mention only the main disciplines. To illustrate their applicability, a couple of case studies are discussed in Chapter 11. One of them deals with the analysis of Kobe earthquake of 1995. In this study, most facets of the estimation procedure of input–output models are involved, including parameter identification and model complexity selection. Second, we consider the problem of estimating the unknown frequency of a periodic signal corrupted by noise, by resorting to the input–output approach as well as with the state space approach by nonlinear Kalman techniques.

There are, moreover, many numerical examples to accompany and complement the presentation and development of the various methods.

In this book, we focus on the discrete time case. The basic pillars on which we rely are random notions, dynamic systems, and matrix theory.

Random variables and stationary processes are gradually introduced in the first sections of the initial Chapter 1. As already said, our concise treatment hinges on simple notions, culminating in the concept of white process, the elementary brick for the construction of the class of models we deal with. Going through these pages, the readers will become progressively familiar with stationary processes and ideas, as tools for the description of uncertain data.

The main concepts concerning linear discrete‐time dynamical systems are outlined in Appendix A. They range from state space to transfer functions, including their interplay via realization theory.

In Appendix B, the readers who are not familiar with matrix analysis will find a comprehensive overview not only of eigenvalues and eigenvectors, determinant and basis, but also of the notion of rank and the basic tool for its practical determination, singular value decomposition.

Finally, a set of problems with their solution is proposed in Appendix C.

Most simulations presented in this volume have been performed with the aid of MATLAB® package, see https://it.mathworks.com/help/ident/.

The single guiding principle in writing this books has been to introduce and explain the subject to readers as clearly as possible.

Acknowledgments

The birth of a new book is an emotional moment, especially when it comes after years of research and teaching.

This text is indeed the outcome of my years of lecturing model identification and data analysis (MIDA) at the Politecnico di Milano, Italy. In its first years of existence, the course had a very limited number of students. Nowadays, there are various MIDA courses, offered to master students of automation and control engineering, electronic engineering, bio‐engineering, computer engineering, aerospace engineering, and mathematical engineering.

In my decades of scientific activity, I have had the privilege of meeting and working with many scholars. Among them, focusing on the Italian community, are Paolo Bolzern, Claudio Bonivento, Marco Claudio Campi, Patrizio Colaneri, Antonio De Marco, Giuseppe De Nicolao, Marcello Farina, Simone Formentin, Giorgio Fronza, Simone Garatti, Roberto Guidorzi, Alberto Isidori, Antonio Lepschy, Diego Liberati, Arturo Locatelli, Marco Lovera, Claudio Maffezzoni, Gianantonio Magnani, Edoardo Mosca, Giorgio Picci, Luigi Piroddi, Maria Prandini, Fabio Previdi, Paolo Rocco, Sergio Matteo Savaresi, Riccardo Scattolini, Nicola Schiavoni, Silvia Carla Strada, Mara Tanelli, Roberto Tempo, and Antonio Vicino.

I am greatly indebted to Silvia Maria Canevese for her generous help in the manuscript editing, thank you Silvia. Joshua Burkholder, Luigi Folcini, Chiara Pasqualini, Grace Paulin Jeeva S, Marco Rapizza, Matteo Zovadelli, and Fausto Vezzaro also helped out with the editing in various phases of the work.

I also express my gratitude to Guido Guardabassi for all our exchanges of ideas on this or that topic and for his encouragement to move toward the subject of data analysis in my early university days.

Some of these persons, as well as other colleagues from around the world, are featured in the picture at the end of the book (taken at a workshop held in 2017 at Lake Como, Italy).

A last note of thanks goes to the multitude of students I met over the years in my class. Their interest has been an irreplaceable stimulus for my never ending struggle to explain the subject as clearly and intelligibly as possible.

Sergio Bittanti

e‐mail: [email protected]

website: home.deib.polimi.it/bittanti/

The support of the Politecnico di Milano and the National Research Council of Italy (Consiglio Nazionale delle Ricerche–CNR) is gratefully acknowledged.

1Stationary Processes and Time Series

1.1 Introduction

Forecasting the evolution of a man‐made system or a natural phenomenon is one of the most ancient problems of human kind. We develop here a prediction theory under the assumption that the variable under study can be considered as stationary process. The theory is easy to understand and simple to apply. Moreover, it lends itself to various generalizations, enabling to deal with nonstationary signals.

The organization is as follows. After an introduction to the prediction problem (Section 1.2), we concisely review the notions of random variable, random vector, and random (or stochastic) process in Sections 1.3–1.5, respectively. This leads to the definition of white process (Section 1.6), a key notion in the subsequent developments. The readers who are familiar with random concepts can skip Sections 1.3–1.5.

Then we introduce the moving average (MA) process and the autoregressive (AR) process (Sections 1.7 and 1.8). By combining them, we come to the family of autoregressive and moving average (ARMA) processes (Section 1.10). This is the family of stationary processes we focus on in this volume.

For such processes, in Chapter 3, we develop a prediction theory, thanks to which we can easily work out the optimal forecast given the model.

In our presentation, we make use of elementary concepts of linear dynamical systems such as transfer functions, poles, and zeros; the readers who are not familiar with such topics are cordially invited to first study Appendix A.

1.2 The Prediction Problem

Consider a real variable depending on discrete time . The variable is observed over the interval . The problem is to predict the value that will take the subsequent sample .

Various prediction rules may be conceived, providing a guess for based on . A generic predictor is denoted with the symbol :

The question is how to choose function .

A possibility is to consider only a bunch of recent data, say , , , , and to construct the prediction as a linear combination of them with real coefficients , , …, :

The problem then becomes that of selecting the integer and the most appropriate values for parameter , , …, .

Suppose for a moment that and were selected. Then the prediction rule is fully specified and it can be applied to the past time points for which data are available to evaluate the prediction error:

Let's now consider this fundamental question: Which characteristics should the prediction error exhibit in order to conclude that we have constructed a “good predictor”? In principle, the best one can hope for is that the prediction error be null at any time point. However, in practice, this is Utopian. Hence, we have to investigate the properties that a non‐null should exhibit in order to conclude that the prediction is fair.

For the sake of illustration, consider the case when has the time evolution shown in Figure 1.1a. As can be seen, the mean value of is nonzero. Correspondingly, the rule

would be better than the original one. Indeed, with the new rule of prediction, one can get rid of the systematic error.

Figure 1.1 Possible diagrams of the prediction error.

As a second option, consider the case when the prediction error is given by the diagram of Figure 1.1b. Then the mean value is zero. However, the sign of changes at each instant; precisely, for even and for odd. Hence, even in such a case, a better prediction rule than the initial one can be conceived. Indeed, one can formulate the new rule:

and

From these simple remarks, one can conclude that the best predictor should have the following property: besides a zero mean value, the prediction error should have no regularity, rather it should be fully unpredictable. In this way, the model captures the whole dynamic hidden in data and no useful information remains unveiled in the residual error, and no better predictor can be conceived. The intuitive concept of “unpredictable signal” has been formalized in the twentieth century, leading to the notion of white noise (WN) or white process, a concept we precisely introduce later in this chapter. For the moment, it is important to bear in mind the following conclusion: A prediction rule is appropriate if the corresponding prediction error is a white process.

In this connection, we make the following interesting observation. Assume that is indeed a white noise, then

Rewrite this difference equation by means of the delay operator , namely the operator such that

Then

from which

with

By reinterpreting as the complex variable, this relationship becomes the expression of a dynamical system with transfer function (from to ) given by .

Summing up, finding a good predictor is equivalent to determining a model supplying the given sequence of data as the output of a dynamical system fed by white noise (Figure 1.2).

Figure 1.2 Interpreting a sequence of data as the output of a dynamic model fed by white noise.

This is why studying dynamical systems having a white noise at the input is a main preliminary step toward the study of prediction theory.

The road we follow toward this objective relies first on the definition of white noise, which we pursue in four stages: random variable random vector stochastic process white noise.

1.3 Random Variable

A random (or stochastic) variable is a real variable that depends upon the outcome of a random experiment. For example, the variable taking the value or depending on the result of the tossing of a coin is a random variable.

The outcome of the random experiment is denoted by ; hence, a random variable is a function of : .

For our purposes, a random variable is described by means of its mean value (or expected value) and its variance, which we will denote by and , respectively.

The mean value is the real number around which the values taken by the variable fluctuate. Note that, given two random variables, and with mean values and , the random variable

obtained as a linear combination of and via the real numbers and , has a mean value:

The variance captures the intensity of fluctuations around the mean value. To be precise, it is defined as

where denotes the mean value of . Obviously, being non‐negative, the variance is a real non‐negative number.

Often, the variance is denoted with symbols such as or . When one deals with various random variables, the variance of the th variable may be denoted as or .

The square root of the variance is called standard deviation, denoted by or . If the random variable has a Gaussian distribution, then the mean value and the variance define completely the probability distribution of the variable. In particular, if a random variable is Gaussian, the probability that it takes value in the interval and is about . So if is Gaussian with mean value 10 and variance 100, then, in cases, the values taken by range from to .

1.4 Random Vector

A random (or stochastic) vector is a vector whose elements are random variables. We focus for simplicity on the bi‐dimensional case, namely, given two random variables and ,

is a random vector (of dimension 2). The mean value of a random vector is defined as the vector of real numbers constituted by the mean values of the elements of the vector. Thus,

where and are the mean values of and , respectively. The variance is a matrix given by

where

Here, besides variances and of the single random variables, the so‐called “cross‐variance” between and , , and “cross‐variance” between and , , appear. Obviously, , so that is a symmetric matrix.

It is easy to verify that the variance matrix can also be written in the form

where ′ denotes transpose.

In general, for a vector of any dimension, the variance matrix is given by

where is the vector whose elements are the mean values of the random variables entering .

If is a vector with entries, is a matrix. In any case, is a symmetric matrix having the variances of the single variables composing vector along the diagonal and all cross‐variances as off‐diagonal terms.

A remarkable feature of a variance matrix is that it is a positive semi‐definite matrix.

Remark 1.1 (Positive semi‐definiteness)

The notions of positive semi‐definite and positive definite matrix are explained in Appendix B. In a very concise way, given a real symmetric matrix , associate to it the scalar function defined as , where is an ‐dimensional real vector. For example, if

we take

Then

Hence, is quadratic in the entries of vector . Matrix is said to be

positive semi‐definite if

positive definite if it is positive semi‐definite and

only for

We write and to denote a positive semi‐definite and a positive definite matrix, respectively.

We can now verify that, for any random vector , is positive semi‐definite. Indeed, consider

Then

Here, we have used the property . Observe now that , being the product of a row vector times a column vector, is a scalar. As such, it coincides with its transpose: . Therefore,

This is the expected value of a square, namely a non‐negative real number. Therefore, this quantity is non‐negative for any . Hence, we come to the conclusion that any variance matrix is positive semi‐definite. We simply write

1.4.1 Covariance Coefficient

Among the remarkable properties of positive semi‐definite matrices, there is the fact that their determinant is non‐negative (see Appendix B). Hence, referring to the two‐dimensional case,

Under the assumption that and , this inequality suggests to define

is known as covariance coefficient between random variables and . When and have zero mean value, is also known as correlation coefficient. The previous inequality on the determinant of the variance matrix can be restated as follows:

One says that and are uncorrelated when . If instead or , one says that they have maximal correlation.

Example 1.1 (Covariance coefficient for two random variables subject to a linear relation)

Given a random variable , with , consider the variable

where is a real number. To determine the covariance coefficient between and , we compute the mean value and the variance of as well as the cross‐covariance . The mean value of is

Its variance is easily computed as follows:

As for the cross‐variance, we have

Therefore,

Finally, if , then . In conclusion,

In particular, we see that, if , the correlation is maximal in absolute value. This is expected, since, being , knowing the value taken by , one can evaluate without any error.

1.5 Stationary Process

A random or stochastic process is a sequence of random variables ordered with an index , referred to as time. We consider as a discrete index (). The random variable associated with time is denoted by . It is advisable to recall that a random variable is not a number, it is a real function of the outcome of a random experiment . In other words,

Thus, a stochastic process is an infinite sequence of real variables, each of which depends upon two variables, time and outcome . Often, for simplicity in notation, the dependence upon is omitted and one simply writes to denote the process. However, one should always keep in mind that depends also upon the outcome of an underlying random experiment, .

Once a particular outcome is fixed, the set defines a real function of time : . Such function is named process realization. To each outcome, a realization is associated. Hence, the set of realizations is the set of possible signals that the process can exhibit depending on the specific outcome of a random experiment. If, in the opposite, time is fixed, then one obtains , the random variable at time extracted from the process.

Example 1.2 (Random process determined by the tossing of a coin)

Consider the following process: Toss a coin, if the outcome is heads, then we associate to it the function , if the outcome is tails, we associate the function . The random process so defined has two sinusoidal signals as realizations, and . At a given time point , the process is a random variable , which can take two values, and .

The simplest way to describe a stochastic process is to specify its mean function and its covariance function.

Mean function

The mean function is defined as

Operator performs the average over all possible outcomes of the underlying random experiment. Hence, we also write

In such averaging, is a fixed parameter. Therefore, does not depend upon anymore; it depends on only. is the function of time around which the samples of the random variable fluctuate.

Variance function

The variance function of the process is

It provides the variances of the random variables at each time point.

Covariance function

The covariance function captures the mutual dependence of two random variables extracted from the process at different time points, say at times and . It is defined as

It characterizes the interdependence between the deviation of around its mean and the deviation of around its mean value . Note that, if we consider the same function with exchanged indexes, i.e. , we have

Since , it follows that

(1.1)

Furthermore, by setting we obtain

(1.2)

This is the variance of random variable . Hence, when the two time indexes coincide, the covariance function supplies the process variance at the given time point.

We are now in a position to introduce the concept of stationary process.

Definition 1.1

A stochastic process is said to be stationary when

is constant,

depends upon

only.

Therefore, the mean value of a stationary process is simply indicated as

and the covariance function can be denoted with the symbol , where :

Note that, for , from this expression, we have . In other words, is the variance of the process.

Summing up, a stationary stochastic process is described by its mean value (a real number) and its covariance function (a real function). The variance of the process is implicitly given by the covariance function at .

We now review the main properties of the covariance function of a stationary process.

Indeed, is a variance.

This is a consequence of 1.1 (taken and ).

Indeed, consider any pair of random variables drawn for the process, say and , with different time points. The covariance coefficient between such variables is

On the other hand, we know that , so that cannot oversize .

This last property suggests the definition of the normalized covariance function as

Obviously, , while , . Note that, for , and may be both positive or negative.

Further properties of the covariance function are discussed in Section 2.2.

1.6 White Process

A white process is defined as the stationary stochastic process having the following covariance function:

This means that, if we take any pair of time points and with , the deviations of and from the process mean value are uncorrelated whatever and be. Thus, the knowledge of the value of the process at time is of no use to predict the value of the process at time , . The only prediction that can be formulated is the trivial one, the mean value. This is why the white process is a way to formalize the concept of fully unpredictable signal.

The white process is also named white noise (WN).

We will often use the compact notation

to mean that is a white process with

The white noise is the basic brick to construct the family of stationary stochastic processes that we work with.

1.7 MA Process

An MA process is a stochastic process generated as a linear combination of the current and past values of a white process :

where are real numbers.

We now determine the main features of . We start with the computation of the mean value and the variance of . As for the mean,

Since it follows that

Passing to the variance, we have

being white, all mean values of the cross‐products of the type with are equal to zero. Hence,

Turn now to the covariance function . First, we consider the case when , and for simplicity, we set and . Then

It is easy to see that the same conclusion holds true if , so that

Analogous computations can be performed for , obtaining

We see that and do not depend on time .

In general, we come to the conclusion that does not depend upon and separately; it depends upon only. Precisely,

Summing up, any MA process has

constant mean value,

constant variance,

covariance function depending upon the distance between the two considered time points.

Therefore, it is a stationary process, whatever values parameters , may take.

Observe that the expression of an MA process

can be restated by means of the delay operator as

Then by introducing the operator

one can write

From this expression, the transfer function from to can be worked out

Note that this transfer function has poles in the origin of the complex plane, whereas the zeros, the roots of polynomial , may be located in various positions, depending on the values of the parameters.

Remark 1.2 (MA() process)

We extrapolate the above notion of MA() process and consider the MA() case too:

Of course, this definition requires some caution, as in any series of infinite terms. If the white process has zero mean value, then also has a zero mean value. The variance can be obtained by extrapolating the expression of the variance for the MA() case, namely,

(1.3)

Tausende von E-Books und Hörbücher

Ihre Zahl wächst ständig und Sie haben eine Fixpreisgarantie.

Sie haben über uns geschrieben: