Nonlinear Filters - Peyman Setoodeh - E-Book

Nonlinear Filters E-Book

Peyman Setoodeh

0,0
114,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

NONLINEAR FILTERS

Discover the utility of using deep learning and (deep) reinforcement learning in deriving filtering algorithms with this insightful and powerful new resource

Nonlinear Filters: Theory and Applications delivers an insightful view on state and parameter estimation by merging ideas from control theory, statistical signal processing, and machine learning. Taking an algorithmic approach, the book covers both classic and machine learning-based filtering algorithms.

Readers of Nonlinear Filters will greatly benefit from the wide spectrum of presented topics including stability, robustness, computability, and algorithmic sufficiency. Readers will also enjoy:

  • Organization that allows the book to act as a stand-alone, self-contained reference
  • A thorough exploration of the notion of observability, nonlinear observers, and the theory of optimal nonlinear filtering that bridges the gap between different science and engineering disciplines
  • A profound account of Bayesian filters including Kalman filter and its variants as well as particle filter
  • A rigorous derivation of the smooth variable structure filter as a predictor-corrector estimator formulated based on a stability theorem, used to confine the estimated states within a neighborhood of their true values
  • A concise tutorial on deep learning and reinforcement learning
  • A detailed presentation of the expectation maximization algorithm and its machine learning-based variants, used for joint state and parameter estimation
  • Guidelines for constructing nonparametric Bayesian models from parametric ones

Perfect for researchers, professors, and graduate students in engineering, computer science, applied mathematics, and artificial intelligence, Nonlinear Filters: Theory and Applications will also earn a place in the libraries of those studying or practicing in fields involving pandemic diseases, cybersecurity, information fusion, augmented reality, autonomous driving, urban traffic network, navigation and tracking, robotics, power systems, hybrid technologies, and finance.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 386

Veröffentlichungsjahr: 2022

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

Title Page

Copyright

Dedication

List of Figures

List of Table

Preface

Acknowledgments

Acronyms

1 Introduction

1.1 State of a Dynamic System

1.2 State Estimation

1.3 Construals of Computing

1.4 Statistical Modeling

1.5 Vision for the Book

2 Observability

2.1 Introduction

2.2 State‐Space Model

2.3 The Concept of Observability

2.4 Observability of Linear Time‐Invariant Systems

2.5 Observability of Linear Time‐Varying Systems

2.6 Observability of Nonlinear Systems

2.7 Observability of Stochastic Systems

2.8 Degree of Observability

2.9 Invertibility

2.10 Concluding Remarks

3 Observers

3.1 Introduction

3.2 Luenberger Observer

3.3 Extended Luenberger‐Type Observer

3.4 Sliding‐Mode Observer

3.5 Unknown‐Input Observer

3.6 Concluding Remarks

4 Bayesian Paradigm and Optimal Nonlinear Filtering

4.1 Introduction

4.2 Bayes' Rule

4.3 Optimal Nonlinear Filtering

4.4 Fisher Information

4.5 Posterior Cramér–Rao Lower Bound

4.6 Concluding Remarks

5 Kalman Filter

5.1 Introduction

5.2 Kalman Filter

5.3 Kalman Smoother

5.4 Information Filter

5.5 Extended Kalman Filter

5.6 Extended Information Filter

5.7 Divided‐Difference Filter

5.8 Unscented Kalman Filter

5.9 Cubature Kalman Filter

5.10 Generalized PID Filter

5.11 Gaussian‐Sum Filter

5.12 Applications

5.13 Concluding Remarks

6 Particle Filter

6.1 Introduction

6.2 Monte Carlo Method

6.3 Importance Sampling

6.4 Sequential Importance Sampling

6.5 Resampling

6.6 Sample Impoverishment

6.7 Choosing the Proposal Distribution

6.8 Generic Particle Filter

6.9 Applications

6.10 Concluding Remarks

7 Smooth Variable‐Structure Filter

7.1 Introduction

7.2 The Switching Gain

7.3 Stability Analysis

7.4 Smoothing Subspace

7.5 Filter Corrective Term for Linear Systems

7.6 Filter Corrective Term for Nonlinear Systems

7.7 Bias Compensation

7.8 The Secondary Performance Indicator

7.9 Second‐Order Smooth Variable Structure Filter

7.10 Optimal Smoothing Boundary Design

7.11 Combination of SVSF with Other Filters

7.12 Applications

7.13 Concluding Remarks

8 Deep Learning

8.1 Introduction

8.2 Gradient Descent

8.3 Stochastic Gradient Descent

8.4 Natural Gradient Descent

8.5 Neural Networks

8.6 Backpropagation

8.7 Backpropagation Through Time

8.8 Regularization

8.9 Initialization

8.10 Convolutional Neural Network

8.11 Long Short‐Term Memory

8.12 Hebbian Learning

8.13 Gibbs Sampling

8.14 Boltzmann Machine

8.15 Autoencoder

8.16 Generative Adversarial Network

8.17 Transformer

8.18 Concluding Remarks

9 Deep Learning‐Based Filters

9.1 Introduction

9.2 Variational Inference

9.3 Amortized Variational Inference

9.4 Deep Kalman Filter

9.5 Backpropagation Kalman Filter

9.6 Differentiable Particle Filter

9.7 Deep Rao–Blackwellized Particle Filter

9.8 Deep Variational Bayes Filter

9.9 Kalman Variational Autoencoder

9.10 Deep Variational Information Bottleneck

9.11 Wasserstein Distributionally Robust Kalman Filter

9.12 Hierarchical Invertible Neural Transport

9.13 Applications

9.14 Concluding Remarks

10 Expectation Maximization

10.1 Introduction

10.2 Expectation Maximization Algorithm

10.3 Particle Expectation Maximization

10.4 Expectation Maximization for Gaussian Mixture Models

10.5 Neural Expectation Maximization

10.6 Relational Neural Expectation Maximization

10.7 Variational Filtering Expectation Maximization

10.8 Amortized Variational Filtering Expectation Maximization

10.9 Applications

10.10 Concluding Remarks

11 Reinforcement Learning‐Based Filter

11.1 Introduction

11.2 Reinforcement Learning

11.3 Variational Inference as Reinforcement Learning

11.4 Application

11.5 Concluding Remarks

12 Nonparametric Bayesian Models

12.1 Introduction

12.2 Parametric vs Nonparametric Models

12.3 Measure‐Theoretic Probability

12.4 Exchangeability

12.5 Kolmogorov Extension Theorem

12.6 Extension of Bayesian Models

12.7 Conjugacy

12.8 Construction of Nonparametric Bayesian Models

12.9 Posterior Computability

12.10 Algorithmic Sufficiency

12.11 Applications

12.12 Concluding Remarks

References

Index

Wiley End User License Agreement

List of Tables

Chapter 11

Table 11.1 Reinforcement learning and variational inference viewed as expect...

List of Illustrations

Chapter 1

Figure 1.1 The encoder of an asymmetric autoencoder plays the role of a nonl...

Chapter 6

Figure 6.1 Typical posterior estimate trajectories for: (a) sampling importa...

Chapter 7

Figure 7.1 The SVSF state estimation concept.

Figure 7.2 Effect of the smoothing subspace on chattering: (a)

and (b)

....

Figure 7.3 Combining the SVSF with Bayesian filters.

Guide

Cover

Table of Contents

Title Page

Copyright

Dedication

List of Figures

List of Table

Preface

Acknowledgments

Acronyms

Begin Reading

References

Index

Wiley End User License Agreement

Pages

iii

iv

v

xiii

xv

xvii

xviii

xix

xxi

xxii

1

2

3

4

5

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

29

30

31

32

33

34

35

36

37

38

39

41

42

43

44

45

46

47

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

203

204

205

206

207

208

209

210

211

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

Nonlinear Filters

Theory and Applications

 

 

Peyman Setoodeh

McMaster UniversityOntario, Canada

Saeid Habibi

McMaster UniversityOntario, Canada

Simon Haykin

McMaster UniversityOntario, Canada

 

 

 

 

 

This edition first published 2022© 2022 by John Wiley & Sons, Inc.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.

The right of Peyman Setoodeh, Saeid Habibi, and Simon Haykin to be identified as the authors of this work has been asserted in accordance with law.

Registered OfficeJohn Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA

Editorial Office111 River Street, Hoboken, NJ 07030, USA

For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.

Wiley also publishes its books in a variety of electronic formats and by print‐on‐demand. Some content that appears in standard print versions of this book may not be available in other formats.

Limit of Liability/Disclaimer of WarrantyThe contents of this work are intended to further general scientific research, understanding, and discussion only and are not intended and should not be relied upon as recommending or promoting scientific method, diagnosis, or treatment by physicians for any particular patient. In view of ongoing research, equipment modifications, changes in governmental regulations, and the constant flow of information relating to the use of medicines, equipment, and devices, the reader is urged to review and evaluate the information provided in the package insert or instructions for each medicine, equipment, or device for, among other things, any changes in the instructions or indication of usage and for added warnings and precautions. While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

Library of Congress Cataloging‐in‐Publication Data Applied for:ISBN: 9781118835814

Cover Design: WileyCover Image: © Emrah Turudu/Getty Images

 

 

 

 

To the memory of RudolfEmil Kalman

List of Figures

 

Figure 1.1

The encoder of an asymmetric autoencoder plays the role of a nonlinear filter.

Figure 6.1

Typical posterior estimate trajectories for: (a) sampling importance resampling (SIR) particle filter, (b) SIR particle filter with MCMC step, (c) likelihood particle filter, and (d) extended Kalman particle filter.

Figure 7.1

The SVSF state estimation concept.

Figure 7.2

Effect of the smoothing subspace on chattering: (a)

and (b)

.

Figure 7.3

List of Table

 

Table 11.1

Reinforcement learning and variational inference viewed as expectation maximization.

Preface

Taking an algorithmic approach, this book provides a step towards bridging the gap between control theory, statistical signal processing, and machine learning regarding the state/parameter estimation problem. State estimation is an important concept that has profoundly influenced different branches of science and engineering. State of a system refers to a minimal record of the past history, which is required for predicting the future behavior. In this regard, a dynamic system can be described from the state perspective by selecting a set of independent variables as state variables. It is often desirable to know the state variables, and in control applications, to force them to follow desired trajectories in the state space. State estimation refers to the process of reconstructing the hidden or latent state variables, which cannot be directly measured, from system inputs and outputs in the minimum possible length of time. Filtering algorithms, which are deployed for state estimation, aim at minimizing the error between the estimated and the true values of the state variables.

The first part of the book is dedicated to classic estimation algorithms. A thorough presentation of the notion of observability, which refers to the ability to reconstruct the state variables from measurements, is followed by covering a number of observers as state estimators for deterministic systems. Regarding stochastic systems, optimal Bayesian filtering is presented that provides a conceptual solution for the general state estimation problem. Different Bayesian filtering algorithms have been developed based on computationally tractable approximations of the conceptual Bayesian solution. For the special case of linear systems with Gaussian noise, Kalman filter provides the optimal Bayesian solution. To extend the application of Kalman filter to nonlinear systems, two main approaches have been proposed to provide suboptimal solutions: using power series to approximate the nonlinear functions and approximating the probability distributions. While extended Kalman filter, extended information filter, and divided‐difference filter approximate the nonlinear functions, unscented Kalman filter, cubature Kalman filter, and particle filter approximate the probability distributions. Other Kalman filter variants include Gaussian‐sum filter, which handles non‐Gaussianity, and generalized PID filter. Among the mentioned filters, particle filter is capable of handling nonlinear and non‐Gaussian systems. Smooth variable‐structure filter, which has been derived based on a stability theorem, is able to handle model uncertainties. Moreover, it benefits from using a secondary set of performance indicators in addition to the innovation vector.

The second part of the book is dedicated to machine learning‐based filtering algorithms. Basic learning algorithms, deep learning architectures, and variational inference are reviewed to lay the groundwork for such algorithms. Different deep learning‐based filters have been developed, which deploy supervised or unsupervised learning. These filters include deep Kalman filter, backpropagation Kalman filter, differentiable particle filter, deep Rao–Blackwellized particle filter, deep variational Bayes filter, Kalman variational autoencoder, and deep variational information bottleneck. Wasserstein distributionally robust Kalman filter and hierarchical invertible neural transport are presented in addition to the mentioned filtering algorithms. Expectation maximization allows for joint state and parameter estimation. Different variants of expectation maximization algorithm are implemented using particles, Gaussian mixture models, deep neural networks, relational deep neural networks, variational filters, and amortized variational filters. Variational inference and reinforcement learning can be viewed as instances of a generic expectation maximization problem. As a result, (deep) reinforcement learning methods can be used to develop novel filtering algorithms. Finally, the book covers nonparametric Bayesian models. In addition to reviewing measure‐theoretic probability concepts and the notions of exchangeability, posterior computability, and algorithmic sufficiency, guidelines are provided for constructing nonparametric Bayesian models from parametric ones.

This book reviews a wide range of applications of classic and machine learning‐based filtering algorithms regarding COVID‐19 pandemic, influenza incidence, prediction of drug effect, robotics, information fusion, augmented reality, battery state‐of‐charge estimation for electric vehicles, autonomous driving, target tracking, urban traffic network, cybersecurity and optimal power flow in power systems, single‐molecule fluorescence microscopy, and finance.

 

P. Setoodeh, S. Habibi, and S. Haykin

Hamilton, Ontario, CanadaJanuary 2022

Acknowledgments

We would like to express our deepest gratitude to several colleagues, who helped us in one form or another while writing this book: Dr. Mehdi Fatemi, Dr. Pouya Dehghani Tafti, Dr. Ehsan Taghavi, Dr. Andrew Gadsden, Dr. Hamed Hossein Afshari, Dr. Mina Attari, Dr. Dhafar Al‐Ani, Dr. Ulaş Güntürkün, Dr. Yanbo Xue, Dr. Ienkaran Arasaratnam, Dr. Mohammad Al‐Shabi, Dr. Alireza Khayatian, Dr. Ali Akbar Safavi, Dr. Ebrahim Farjah, Dr. Paknoosh Karimaghaee, Dr. Mohammad Ali Masnadi‐Shirazi, Dr. Mohammad Eghtesad, Dr. Majid Rostami‐Shahrbabaki, Dr. Zahra Kazemi, Dr. Farshid Naseri, Dr. Zahra Khatami, Dr. Mohsen Mohammadi, Dr. Thiagalingam Kirubarajan, Dr. Stergios Roumeliotis, Dr. Magnus Norgaard, Dr. Eric Foxlin, Dr. Maryam Dehghani, Dr. Mohammad Mehdi Arefi, Dr. Mohammad Hassan Asemani, Dr. Mohammad Mohammadi, Dr. Mehdi Allahbakhshi, Dr. Haidar Samet, Dr. Mohammad Rastegar, Dr. Behrooz Zaker, Dr. Ali Reza Seifi, Dr. Mahdi Raoofat, Dr. Jun Luo, and Dr. Steven Hockema.

Last but by no means least, we would like to thank our families. Their endless support, encouragement, and love have always been a source of energy for us.

 

P. Setoodeh, S. Habibi, and S. Haykin

Acronyms

 

Backprop KF

backpropagation Kalman filter

BMS

battery management systems

CKF

cubature Kalman filter

CNN

convolutional neural network

CRLB

Cramér–Rao lower bound

DDF

divided‐difference filter

DKF

deep Kalman filter

DRBPF

deep Rao–Blackwellized particle filter

DVBF

deep variational Bayes filter

DVIB

deep variational information bottleneck

EKF

extended Kalman filter

ELBO

evidence lower bound

EM

expectation maximization

FATE

fairness, accountability, transparency, and ethics

GAN

generative adversarial network

GRU

gated recurrent unit

HINT

hierarchical invertible neural transport

IB

information bottleneck

IMM

interacting multiple model

IS

importance sampling

KF

Kalman filter

KLD

Kullback–Leibler divergence

KVAE

Kalman variational autoencoder

LSTM

long short‐term memory

LTI

linear time‐invariant

LTV

linear time‐varying

MAP

maximum a posteriori

MCMC

Markov chain Monte Carlo

MDP

Markov decision process

ML

maximum likelihood

MMSE

minimum mean square error

MPF

marginalized particle filter

N‐EM

neural expectation maximization

NIB

nonlinear information bottleneck

NLL

negative log likelihood

PCRLB

posterior Cramér–Rao lower bound

PDF

probability distribution function

P‐EM

particle expectation maximization

PF

particle filter

PID

proportional‐integral‐derivative

POMDP

partially‐observable Markov decision process

RBPF

Rao‐Blackwellised particle filter

ReLU

rectified linear unit

R‐NEM

relational neural expectation maximization

RNN

recurrent neural network

SGVB

stochastic gradient variational Bayes

SIR

sampling importance resampling

SLAM

simultaneous localization and mapping

SMAUG

single molecule analysis by unsupervised Gibbs sampling

SMC

sequential Monte Carlo

SoC

state of charge

SoH

state of health

SVSF

smooth variable‐structure filter

TD learning

temporal‐difference learning

UIO

unknown‐input observer

UKF

unscented Kalman filter

VAE

variational autoencoder

VFEM

variational filtering expectation maximization

VSC

variable‐structure control

wILI

weighted influenza‐like illness

1Introduction

1.1 State of a Dynamic System

In many branches of science and engineering, deriving a probabilistic model for sequential data plays a key role. System theory provides guidelines for studying the underlying dynamics of sequential data (time series). In describing a dynamic system, the notion of state is a key concept [1]:

Definition 1.1 State of a dynamic systemis the smallest collection of variables that must be specified at a time instant in order to be able to predict the behavior of the system for any time instant . To be more precise, the state is the minimal record of the past history, which is required to predict the future behavior.

According to the principle of causality, any dynamic system may be described from the state perspective. Deploying a state‐transition model allows for determining the future state of a system, , at any time instant , given its initial state, , at time instant as well as the inputs to the system, , for . The output of the system, , is a function of the state, which can be computed using a measurement model. In this regard, state‐space models are powerful tools for analysis and control of dynamic systems.

1.2 State Estimation

Observability is a key concept in system theory, which refers to the ability to reconstruct the hidden or latent state variables that cannot be directly measured, from the measured variables in the minimum possible length of time [1]. In building state‐space models, two key questions deserve special attention [2]:

(i) Is it possible to identify the governing dynamics from data?

(ii) Is it possible to perform inference from observables to the latent state variables?

At time instant , the inference problem to be solved is to find the estimate of in the presence of noise, which is denoted by . Depending of the value of , estimation algorithms are categorized into three groups [3]:

(i)

Prediction

:

,

(ii)

Filtering

:

,

(iii)

Smoothing

:

.

Regarding the mentioned two challenging questions, in order to improve performance, sophisticated representations can be deployed for the system under study. However, the corresponding inference algorithms may become computationally demanding. Hence, for designing efficient data‐driven inference algorithms, the following points must be taken into account [2]:

(i) The underlying assumptions for building a state‐space model must allow for reliable system identification and plausible long‐term prediction of the system behavior.

(ii) The inference mechanism must be able to capture rich dependencies.

(iii) The algorithm must be able to inherit the merit of

learning machines

to be trainable on raw data such as sensory inputs in a control system.

(iv) The algorithm must be scalable to big data regarding the optimization of model parameters based on the

stochastic gradient descent

method.

Regarding the important role of computation in inference problems, Section 1.3 provides a brief account of the foundations of computing.

1.3 Construals of Computing

According to [4], a comprehensive theory of computing must meet three criteria:

(i)

Empirical criterion

: Doing justice to practice by keeping the analysis grounded in real‐world examples.

(ii)

Conceptual criterion

: Being understandable in terms of what it says, where it comes from, and what it costs.

(iii)

Cognitive criterion

: Providing an intelligible foundation for the computational theory of mind that underlies both artificial intelligence and cognitive science.

Following this line of thinking, it was proposed in [4] to distinguish the following construals of computation:

Formal symbol manipulation

is rooted in formal logic and metamathematics. The idea is to build machines that are capable of manipulating symbolic or meaningful expressions regardless of their interpretation or semantic content.

Effective computability

deals with the question of what can be done, and how hard it is to do it mechanically.

Execution of an algorithm or rule following

focuses on what is involved in following a set of rules or instructions, and what behavior would be produced.

Calculation of a function

considers the behavior of producing the value of a mathematical function as output, when a set of arguments is given as input.

Digital state machine

is based on the idea of a finite‐state automaton.

Information processing

focuses on what is involved in storing, manipulating, displaying, and trafficking of information.

Physical symbol systems

is based on the idea that the way computers interact with symbols depends on their mutual physical embodiment. In this regard, computers may be assumed to be made of symbols.

Dynamics

must be taken into account in terms of the roles that nonlinear elements, attractors, criticality, and emergence play in computing.

Interactive agents

are capable of interacting and communicating with other agents and even people.

Self‐organizing or complex adaptive systems

are capable of adjusting their organization or structure in response to changes in their environment in order to survive and improve their performance.

Physical implementation

emphasizes on the occurrence of computational practice in real‐world systems.

1.4 Statistical Modeling

Statistical modeling aims at extracting information about the underlying data mechanism that allows for making predictions. Then, such predictions can be used to make decisions. There are two cultures in deploying statistical models for data analysis [5]:

Data modeling culture

is based on the idea that a given stochastic model generates the data.

Algorithmic modeling culture

uses algorithmic models to deal with an unknown data mechanism.

An algorithmic approach has the advantage of being able to handle large complex datasets. Moreover, it can avoid irrelevant theories or questionable conclusions.

Figure 1.1 The encoder of an asymmetric autoencoder plays the role of a nonlinear filter.

Taking an algorithmic approach, in machine learning, statistical models can be classified as [6]:

(i)

Generative models

predict visible effects from hidden causes,

.

(ii)

Discriminative models

infer hidden causes from visible effects,

.

While the former is associated with the measurement process in a state‐space model, the latter is associated with the state estimation or filtering problem. Deploying machine learning, a wide range of filtering algorithms can be developed that are able to learn the corresponding state‐space models. For instance, an asymmetric autoencoder can be designed by combining a generative model and a discriminative model as shown in Figure 1.1[7]. Deep neural networks can be used to implement both the encoder and the decoder. Then, the resulting autoencoder can be trained in an unsupervised manner. After training, the encoder can be used as a filter, which estimates the latent state variables.

1.5 Vision for the Book

This book provides an algorithmic perspective on the nonlinear state/parameter estimation problem for discrete‐time systems, where measurements are available at discrete sampling times and estimators are implemented using digital processors. In Chapter 2, guidelines are provided for discretizing continuous‐time linear and nonlinear state‐space models. The rest of the book is organized as follows:

Chapter 2

presents the notion of observability for deterministic and stochastic systems.

Chapters 3–7 cover classic estimation algorithms:

Chapter 3

is dedicated to observers as state estimators for deterministic systems.

Chapter 4

presents the general formulation of the optimal Bayesian filtering for stochastic systems.

Chapter 5

covers the Kalman filter as the optimal Bayesian filter in the sense of minimizing the mean‐square estimation error for linear systems with Gaussian noise. Moreover, Kalman filter variants are presented that extend its applicability to nonlinear or non‐Gaussian cases.

Chapter 6

covers the particle filter, which handles severe nonlinearity and non‐Gaussianity by approximating the corresponding distributions using a set of particles (random samples).

Chapter 7

covers the smooth variable‐structure filter, which provides robustness against bounded uncertainties and noise. In addition to the innovation vector, this filter benefits from a secondary set of performance indicators.

Chapters 8–11 cover learning‐based estimation algorithms:

Chapter 8

covers the basics of deep learning.

Chapter 9

covers deep‐learning‐based filtering algorithms using supervised and unsupervised learning.

Chapter 10

presents the expectation maximization algorithm and its variants, which are used for joint state and parameter estimation.

Chapter 11

presents the reinforcement learning‐based filter, which is built on viewing variational inference and reinforcement learning as instances of a generic expectation maximization problem.

The last chapter is dedicated to nonparametric Bayesian models:

Chapter 12

covers measure‐theoretic probability concepts as well as the notions of exchangeability, posterior computability, and algorithmic sufficiency. Furthermore, it provides guidelines for constructing nonparametric Bayesian models from finite parametric Bayesian models.

In each chapter, selected applications of the presented filtering algorithms are reviewed, which cover a wide range of problems. Moreover, the last section of each chapter usually refers to a few topics for further study.

2Observability

2.1 Introduction

In many branches of science and engineering, it is common to deal with sequential data, which is generated by dynamic systems. In different applications, it is often desirable to predict future observations based on the collected data up to a certain time instant. Since the future is always uncertain, it is preferred to have a measure that shows our confidence about the predictions. A probability distribution over possible future outcomes can provide this information [8]. A great deal of what we know about a system cannot be presented in terms of quantities that can be directly measured. In such cases, we try to build a model for the system that helps to explain the cause behind what we observe via the measurement process. This leads to the notions of state and state‐space model of a dynamic system. Chapters 3–7 and 9–11 are dedicated to different methods for reconstructing (estimating) the state of dynamic systems from inputs and measurements. Each estimation algorithm has its own advantages and limitations that should be taken into account, when we want to choose an estimator for a specific application. However, before trying to choose a proper estimation algorithm among different candidates, we need to know if for a given model of the dynamic system under study, it is possible to estimate the state of the system from inputs and measurements [9]. This critical question leads to the concept of observability, which is the focus of this chapter.

2.2 State‐Space Model

The behavioral approach for studying dynamic systems is based on an abstract model of the system of interest, which determines the relationship between its input and output. In this abstract model, the input of the system, denoted by , represents the effect of the external events on the system, and its output, denoted by , represents any change it causes to the surrounding environment. The output can be directly measured [10]. State of the system, denoted by , is defined as the minimum amount of information required at each time instant to uniquely determine the future behavior of the system provided that we know inputs to the system as well as the system's parameters. Parameter values reflect the underlying physical characteristics based on which the model of the system was built [9]. State variables may not be directly accessible for measurement; hence, the reason for calling them hidden or latent variables. Regarding the abstract nature of the state variables, they may not even represent physical quantities. However, these variables help us to improve a model's ability to capture the causal structure of the system under study [11].

A state‐space model includes the corresponding mappings from input to state and from state to output. This model also describes evolution of the system's state over time [12]. In other words, any state‐space model has three constituents [8]:

A prior,

, which is associated with the initial state

.

A state‐transition function,

.

An observation function,

.

For controlled systems, the state‐transition function depends on control inputs as well. To be able to model active perception (sensing), the observation function must be allowed to depend on inputs too.

The state‐space representation is based on the assumption that the model is a first‐order Markov process, which means that value of the state vector at cycle depends only on its value at cycle , but not on its values in previous cycles. In other words, the state vector at cycle contains all the information about the system from the initial cycle till cycle . In a sense, the concept of state inherently represents the memory of the system [13]. The first‐order Markov‐model assumption can be shown mathematically as follows:

(2.1)

It should be noted that if a model is not a first‐order Markov process, it would be possible to build a corresponding first‐order Markov model based on an augmented state vector, which includes the state vector at current cycle as well as the state vectors in previous cycles. The order of the Markov process determines that the state vectors from how many previous cycles must be included in the augmented state vector. For instance, if the system is an th‐order Markov process with state vector , the corresponding first‐order Markov model is built based on the augmented state vector:

(2.2)

Moreover, if model parameters are time‐varying, they can be treated as random variables by including them in the augmented state vector as well.

2.3 The Concept of Observability

Observability and controllability are two basic properties of dynamic systems. These two concepts were first introduced by Kalman in 1960 for analyzing control systems based on linear state‐space models [1]. While observability is concerned with how the state vector influences the output vector, controllability is concerned with how the input vector influences the state vector. If a state has no effect on the output, it is unobservable; otherwise, it is observable. To be more precise, starting from an unobservable initial state , system's output will be , in the absence of an input, [14]. Another interpretation would be that unobservable systems allow for the existence of indistinguishable states, which means that if an input is applied to the system at any one of the indistinguishable states, then the output will be the same. On the contrary, observability implies that an observer would be able to distinguish between different initial states based on inputs and measurements. In other words, an observer would be able to uniquely determine observable initial states from inputs and measurements [13, 15]. In a general case, the state vector may be divided into two parts including observable and unobservable states.

Definition 2.1 (State observability)A dynamic system is state observable if for any time , the initial state can be uniquely determined from the time history of the input and the output for ; otherwise, the system is unobservable.

Unlike linear systems, there is not a universal definition for observability of nonlinear systems. Hence, different definitions have been proposed in the literature, which take two questions into consideration:

How to check the observability of a nonlinear system?

How to design an observer for such a system?

While for linear systems, observability is a global property, for nonlinear systems, observability is usually studied locally [9].

Definition 2.2 (State detectability)If all unstable modes of a system are observable, then the system is state detectable.

A system with undetectable modes is said to have hidden unstable modes [16, 17]. Sections provide observability tests for different classes of systems, whether they be linear or nonlinear, continuous‐time or discrete‐time.

2.4 Observability of Linear Time‐Invariant Systems

If the system matrices in the state‐space model of a linear system are constant, then, the model represents a linear time‐invariant (LTI) system.

2.4.1 Continuous‐Time LTI Systems

The state‐space model of a continuous‐time LTI system is represented by the following algebraic and differential equations:

(2.3)
(2.4)

where , , and are the state, the input, and the output vectors, respectively, and , , , and are the system matrices. Here, we need to find out when an initial state vector can be uniquely reconstructed from nonzero initial system output vector and its successive derivatives. We start by writing the system output vector and its successive derivatives based on the state vector as well as the input vector and its successive derivatives as follows:

(2.5)

where the superscript in the parentheses denotes the order of the derivative. The aforementioned equations can be rewritten in the following compact form:

(2.6)

where

(2.7)

and

(2.8)

Initially we have

(2.9)

The continuous‐time system is observable, if and only if the observability matrix is nonsingular (it is full rank), therefore the initial state can be found as:

(2.10)

The observable subspace of the linear system, denoted by , is composed of the basis vectors of the range of , and the unobservable subspace of the linear system, denoted by , is composed of the basis vectors of the null space of . These two subspaces can be combined to form the following nonsingular transformation matrix:

(2.11)

If we apply the aforementioned transformation to the state vector such that:

(2.12)

the transformed state vector will be partitioned to observable modes, , and unobservable modes, :

(2.13)

Then, the state‐space model of (2.3) and (2.4) can be rewritten based on the transformed state vector, , as follows:

(2.14)
(2.15)

or equivalently as:

(2.16)
(2.17)

Any pair of equations (2.14) and (2.15) or (2.16) and (2.17) is called the state‐space model of the system in the observable canonical form. For the system to be detectable (to have stable unobservable modes), the eigenvalues of must have negative real parts ( must be Hurwitz).

2.4.2 Discrete‐Time LTI Systems

The state‐space model of a discrete‐time LTI system is represented by the following algebraic and difference equations:

(2.18)
(2.19)

where , , and are the state, the input, and the output vectors, respectively, and , , , and are the system matrices. Starting from the initial cycle, the system output vector at successive cycles up to cycle can be written based on the initial state vector and input vectors as follows:

(2.20)

The aforementioned equations can be rewritten in the following compact form:

(2.21)

where

(2.22)

and

(2.23)

It is obvious from the linear set of equations (2.21) that in order to be able to uniquely determine the initial state , matrix must be full‐rank, provided that inputs and outputs are known. In other words, if the matrix is full rank, the linear system is observable or reconstructable, hence, the reason for calling the observability matrix. The reverse is true as well, if the system is observable, then the observability matrix will be full‐rank. In this case, the initial state vector can be calculated as:

(2.24)

Since depends only on matrices and , for an observable system, it is equivalently said that the pair is observable. Any initial state that has a component in the null space of cannot be uniquely determined from measurements; therefore, the null space of is called the unobservable subspace of the system. As mentioned before, the system is detectable if the unobservable subspace does not include unstable modes of , which are associated with the eigenvalues that are outside the unit circle.

While the observable subspace of the linear system, denoted by , is composed of the basis vectors of the range of , the unobservable subspace of the linear system, denoted by , is composed of the basis vectors of the null space of . These two subspaces can be combined to form the following nonsingular transformation matrix:

(2.25)

If we apply this transformation to the state vector such that:

(2.26)

the transformed state vector will be partitioned to observable modes, , and unobservable modes, :

(2.27)

Then, the state‐space model of (2.18) and (2.19) can be rewritten based on the transformed state vector, , as follows:

(2.28)
(2.29)

or equivalently as:

(2.30)
(2.31)

Any pair of equations (2.28) and (2.29) or (2.30) and (2.31) is called the state‐space model of the system in the observable canonical form.

2.4.3 Discretization of LTI Systems

When a continuous‐time system is connected to a computer via analog‐to‐digital and digital‐to‐analog converters at input and output, respectively, we need to find a discrete‐time equivalent of the continuous‐time system that describes the relationship between the system's input and its output at certain time instants (sampling times for ). This process is called sampling the continuous‐time system. Using zero‐order‐hold sampling, where the corresponding analog signals are kept constant over the sampling period, we will have the following discrete‐time equivalent for the continuous‐time system of (2.3) and (2.4) [18]:

(2.32)
(2.33)

where

(2.34)
(2.35)

2.5 Observability of Linear Time‐Varying Systems

If the system matrices in the state‐space model of a linear system change with time, then, the model represents a linear time‐varying (LTV) system. Obviously, the observability condition would be more complicated for LTV systems compared to LTI systems.

2.5.1 Continuous‐Time LTV Systems

The state‐space model of a continuous‐time LTV system is represented by the following algebraic and differential equations:

(2.36)
(2.37)

In order to determine the relative observability of different state variables, we investigate their contributions to the energy of the system output. Knowing the input, we can eliminate its contribution to the energy of the output. Therefore, without loss of generality, we can assume that the input is zero. Without an input, evolution of the state vector is governed by the following unforced differential equation:

(2.38)

whose solution is:

(2.39)

where is called the continuous‐time state‐transition matrix, which is itself the solution of the following differential equation:

(2.40)