117,99 €
Nonlinear System Identification: NARMAX Methods in the Time, Frequency, and Spatio-Temporal Domains describes a comprehensive framework for the identification and analysis of nonlinear dynamic systems in the time, frequency, and spatio-temporal domains. This book is written with an emphasis on making the algorithms accessible so that they can be applied and used in practice.
Includes coverage of:
NARMAX algorithms provide a fundamentally different approach to nonlinear system identification and signal processing for nonlinear systems. NARMAX methods provide models that are transparent, which can easily be analysed, and which can be used to solve real problems.
This book is intended for graduates, postgraduates and researchers in the sciences and engineering, and also for users from other fields who have collected data and who wish to identify models to help to understand the dynamics of their systems.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 976
Veröffentlichungsjahr: 2013
Contents
Preface
1 Introduction
1.1 Introduction to System Identification
1.2 Linear System Identification
1.3 Nonlinear System Identification
1.4 NARMAX Methods
1.5 The NARMAX Philosophy
1.6 What is System Identification For?
1.7 Frequency Response of Nonlinear Systems
1.8 Continuous-Time, Severely Nonlinear, and Time-Varying Models and Systems
1.9 Spatio-temporal Systems
1.10 Using Nonlinear System Identification in Practice and Case Study Examples
References
2 Models for Linear and Nonlinear Systems
2.1 Introduction
2.2 Linear Models
2.3 Piecewise Linear Models
2.4 Volterra Series Models
2.5 Block-Structured Models
2.6 NARMAX Models
2.7 Generalised Additive Models
2.8 Neural Networks
2.9 Wavelet Models
2.10 State-Space Models
2.11 Extensions to the MIMO Case
2.12 Noise Modelling
2.13 Spatio-temporal Models
References
3 Model Structure Detection and Parameter Estimation
3.1 Introduction
3.2 The Orthogonal Least Squares Estimator and the Error Reduction Ratio
3.3 The Forward Regression OLS Algorithm
3.4 Term and Variable Selection
3.5 OLS and Sum of Error Reduction Ratios
3.6 Noise Model Identification
3.7 An Example of Variable and Term Selection for a Real Data Set
3.8 ERR is Not Affected by Noise
3.9 Common Structured Models to Accommodate Different Parameters
3.10 Model Parameters as a Function of Another Variable
3.11 OLS and Model Reduction
3.12 Recursive Versions of OLS
References
4 Feature Selection and Ranking
4.1 Introduction
4.2 Feature Selection and Feature Extraction
4.3 Principal Components Analysis
4.4 A Forward Orthogonal Search Algorithm
4.5 A Basis Ranking Algorithm Based on PCA
References
5 Model Validation
5.1 Introduction
5.2 Detection of Nonlinearity
5.3 Estimation and Test Data Sets
5.4 Model Predictions
5.5 Statistical Validation
5.6 Term Clustering
5.7 Qualitative Validation of Nonlinear Dynamic Models
References
6 The Identification and Analysis of Nonlinear Systems in the Frequency Domain
6.1 Introduction
6.2 Generalised Frequency Response Functions
6.3 Output Frequencies of Nonlinear Systems
6.4 Nonlinear Output Frequency Response Functions
6.5 Output Frequency Response Function of Nonlinear Systems
References
7 Design of Nonlinear Systems in the Frequency Domain – Energy Transfer Filters and Nonlinear Damping
7.1 Introduction
7.2 Energy Transfer Filters
7.3 Energy Focus Filters
7.4 OFRF-Based Approach for the Design of Nonlinear Systems in the Frequency Domain
References
8 Neural Networks for Nonlinear System Identification
8.1 Introduction
8.2 The Multi-layered Perceptron
8.3 Radial Basis Function Networks
8.4 Wavelet Networks
8.5 Multi-resolution Wavelet Models and Networks
9 Severely Nonlinear Systems
9.1 Introduction
9.2 Wavelet NARMAX Models
9.3 Systems that Exhibit Sub-harmonics and Chaos
9.4 The Response Spectrum Map
9.5 A Modelling Framework for Sub-harmonic and Severely Nonlinear Systems
9.6 Frequency Response Functions for Sub-harmonic Systems
9.7 Analysis of Sub-harmonic Systems and the Cascade to Chaos
References
10 Identification of Continuous-Time Nonlinear Models
10.1 Introduction
10.2 The Kernel Invariance Method
10.3 Using the GFRFs to Reconstruct Nonlinear Integro-differential Equation Models Without Differentiation
References
11 Time-Varying and Nonlinear System Identification
11.1 Introduction
11.2 Adaptive Parameter Estimation Algorithms
11.3 Tracking Rapid Parameter Variations Using Wavelets
11.4 Time-Dependent Spectral Characterisation
11.5 Nonlinear Time-Varying Model Estimation
11.6 Mapping and Tracking in the Frequency Domain
11.7 A Sliding Window Approach
References
12 Identification of Cellular Automata and N-State Models of Spatio-temporal Systems
12.1 Introduction
12.2 Cellular Automata
12.3 Identification of Cellular Automata
12.4 N-State Systems
References
13 Identification of Coupled Map Lattice and Partial Differential Equations of Spatio-temporal Systems
13.1 Introduction
13.2 Spatio-temporal Patterns and Continuous-State Models
13.3 Identification of Coupled Map Lattice Models
13.4 Identification of Partial Differential Equation Models
13.5 Nonlinear Frequency Response Functions for Spatio-temporal Systems
References
14 Case Studies
14.1 Introduction
14.2 Practical System Identification
14.3 Characterisation of Robot Behaviour
14.4 System Identification for Space Weather and the Magnetosphere
14.5 Detecting and Tracking Iceberg Calving in Greenland
14.6 Detecting and Tracking Time-Varying Causality for EEG Data
14.7 The Identification and Analysis of Fly Photoreceptors
14.8 Real-Time Diffuse Optical Tomography Using RBF Reduced-Order Models of the Propagation of Light for Monitoring Brain Haemodynamics
14.9 Identification of Hysteresis Effects in Metal Rubber Damping Devices
14.10 Identification of the Belousov–Zhabotinsky Reaction
14.11 Dynamic Modelling of Synthetic Bioparts
14.12 Forecasting High Tides in the Venice Lagoon
References
Index
This edition first published 2013© 2013 John Wiley & Sons, Ltd
Registered OfficeJohn Wiley & Sons, Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom
For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com.
The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.
Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. It is sold on the understanding that the publisher is not engaged in rendering professional services and neither the publisher nor the author shall be liable for damages arising herefrom. If professional advice or other expert assistance is required, the services of a competent professional should be sought.
MATLAB® is a trademark of The MathWorks, Inc. and is used with permission. The MathWorks does not warrant the accuracy of the text or exercises in this book. This book’s use or discussion of MATLAB® software or related products does not constitute endorsement or sponsorship by The MathWorks of a particular pedagogical approach or particular use of the MATLAB® software.
Library of Congress Cataloguing-in-Publication Data
Billings, S. ANonlinear system identification : NARMAX methods in the time, frequency, and spatio-temporal domains / Stephen A Billings.pages cmIncludes bibliographical references and index.
ISBN 978-1-119-94359-4 (cloth)1. Nonlinear systems. 2. Nonlinear theories–Mathematical models. 3. Systems engineering. I. Title.QA402.5.B55 2013003′.75–dc23
2013005189
A catalogue record for this book is available from the British Library.
ISBN: 978-1-119-94359-4
All the world is a nonlinear systemHe linearised to the rightHe linearised to the leftTill nothing was rightAnd nothing was left
System identification is a method of identifying or measuring the dynamic model of a system from measurements of the system inputs and outputs. System identification was developed as part of systems and control theory and has now become a toolbox of algorithms and methods that can be applied to a very wide range of real systems and processes. The applications of system identification include any system where the inputs and outputs can be measured. Applications therefore include industrial processes, control systems, economic data and financial systems, biology and the life sciences, medicine, social systems, and many more.
System identification has become an important topic across many subject domains over the last few decades. Initially, the focus was on linear system identification but this has been changing with more of an emphasis on nonlinear systems over recent years. There are several excellent textbooks on linear system identification, time series, spectral analysis methods and algorithms, and hence there is no need to repeat these results here. Rather, the focus of this book is on the identification of nonlinear dynamic systems using what have become known as NARMAX methods. NARMAX, which stands for a nonlinear autoregressive moving average model with exogenous inputs, was initially introduced as the name of a model but then developed into a framework for the identification of nonlinear systems. There are other methods of nonlinear system identification, and many of these are also discussed within the book. But NARMAX methods are based on the goal of determining or identifying the rule or law that describes the behaviour of the underlying system, and this means the focus is on determining the form of the model, what terms should be included in the model, or the structure of the model. The focus is therefore not on gross approximation but on identifying models that are as simple as possible, models that can be written down and related to the underlying system, and which can be used to tease apart and understand complex nonlinear dynamic effects in the wide range of systems that system identification can be applied to.
At the core of NARMAX methods is the ability to build models by finding the most important term and adding this to the model, then finding and adding the next most important term, and so on so that the model is built up in a simple and intuitive way. This mimics the way traditional analytical modelling is done, by finding the most important model terms and then building the model up step by step until a desired accuracy is achieved. The difference with NARMAX methods is that this process is accomplished using measured data in the presence of possible nonlinear and highly coloured noise. The concepts behind this process are simple, intuitive, and easy to use.
There is extensive research literature in the form of published papers on many aspects of nonlinear system identification, including NARMAX methods. The aim in this book is not to reproduce all the many variants of the algorithms that exist, but rather to focus on presenting some of the best algorithms in a clear way. All the detailed nuances and variants of the algorithms will be cited within the book, so that anyone with more theoretical interests can follow up these ideas. But the aim of this book is to focus on the core methods, to try to describe them using the simplest possible terminology, and to clearly describe how to use them in real applications. This will inevitably involve mathematical descriptions and algorithmic details, but the aim is to keep the mathematics as simple as possible. The core aim therefore is to write a book that readers from a range of disciplines can use to understand how to fit models of dynamic nonlinear systems.
The book is an attempt to fill a void in the existing literature. Currently, there are several books on neural networks, and all the variants of these, and on the identification of simple block-structured nonlinear systems. These are important topics, but they address essentially different problems than the main aim of this book. Neural networks are excellent for fitting models for prediction purposes, but they do not produce transparent models, models that can be written down, and which can be analysed in time and frequency. Block-structured systems are a special class of nonlinear systems which are all based on the assumption that the system under study is a member of this simple class.
The main aim of this book is to describe a comprehensive set of algorithms for the identification and analysis of nonlinear systems in the time, frequency, and spatio-temporal domains. While almost every other textbook on nonlinear system identification is focused on time domain methods, we want to address the total oversight in the literature and include frequency and spatio-temporal methods which can provide significant insights into complex system behaviours. These are natural extensions of NARMAX identification methods and offer new directions in nonlinear system identification with many applications.
The readership will include graduates, postgraduates, and researchers in the sciences and engineering, but also users from other research fields who have collected data and who wish to identify models to help understand the dynamics of their systems. While there are examples throughout the book, the last chapter contains many case studies. These are used to illustrate how the methods described in the book can be applied to a wide range of problems from modelling the visual system of fruit flies, to detecting causality in EGG signals, modelling the variations in ice flow, and modelling space weather. These examples are included to demonstrate that the methods in this book do work, that models can quite easily be identified in an intuitive and straightforward way, and used to understand and gain new insights into what appear to be complex effects.
The book starts in Chapter 1 where the focus of the book, the context in which the methods were developed, and the reason for the approaches taken are described in detail. Chapter 2 introduces the different classes of dynamic models. Chapter 3 describes model structure detection and parameter estimation based on the orthogonal least squares (OLS) algorithm and the error reduction ratio. Chapter 4 shows how the methods of Chapter 3 can be adapted for feature and basis function selection. Chapter 5 discusses model validation. Chapter 6 introduces important concepts for the frequency domain analysis of nonlinear systems, and Chapter 7 builds on these results to describe a new class of filters that can be designed to move energy to desired frequency locations, and the design of nonlinear damping devices. Chapter 8 describes how neural networks, including radial basis function and wavelet networks, can be used in system identification. Chapter 9 discusses the identification and analysis of severely nonlinear systems. Chapter 10 is focused on the identification of continuous-time nonlinear models. Chapter 11 shows how very rapid time variation in nonlinear models can be identified and tracked in both time and frequency. Chapter 12 describes spatio-temporal systems with finite states, including cellular automata models and n-state models, and the identification of these. Chapter 13 describes the spatio-temporal class of systems that have a continuous state and introduces system identification, analysis, and frequency response methods for this important class of systems. Chapter 14 includes a very wide range of case studies relating to many important problems.
A graduate course of 20–30 hours could be built using sections from the book. Such a course might include the core models from Chapter 2, the basic and forward regression orthogonal least squares algorithm and the error reduction ratio test from Chapter 3, brief details of feature extraction from Chapter 4, the simple correlation model validity tests for nonlinear systems from Chapter 5, the introduction of generalised frequency response functions and the estimation and interpretation of these using the simple probing methods from Chapter 6, radial basis function neural network training and input node selection using orthogonal least squares concepts from Chapter 8, wavelet models and the response spectrum map from Chapter 9, an introduction to spatio-temporal systems based on cellular automata and coupled map lattice models from Chapters 12 and 13, and finally some case study examples from Chapter 14.
I would like to acknowledge all those who have supported me over many years, those that I have worked with and learnt from, and those that have helped to write each chapter in this book. This book could not have been written without considerable help from colleagues. I would like to acknowledge this help by thanking Hualiang Wei who contributed Chapters 2, 3, 4, 5, 8, and 11; Zi Qiang Lang for Chapters 6 and 7; Liangmin Li for Chapters 9 and 10; Yifan Zhao for Chapter 12; Lingzhong Guo for Chapter 13; and Otar Akanyeti, Misha Balikhin, Richard Boynton, Yifan Zhao, Hualiang Wei, Uwe Friedrich, Danial Coca, Ernesto Vidal Rosas, Bin Zhang, Krish Krishnanathan, and Visakan Kadirkamanathan for help with the case studies.
Over many years I have supervised over 50 PhD students and worked with a similar number of research assistants. I have also been supported, challenged, and inspired by many academic colleagues and friends, both within my own discipline and in other research fields. There are too many to name but they all made important contributions which I would like to acknowledge. Although I can find no records now, my recollection is that Cristian Georgescu supplied the poem about nonlinearity in a personal communication when he applied to study for a PhD with me but unfortunately could not take up this position.
Much of the work in this book has been achieved with support from the research councils and other funding bodies. I gratefully acknowledge this support from the Engineering and Physical Sciences Research Council (EPSRC), The European Research Council (ERC), the Biotechnology and Biological Research Council (BBSRC), the Natural and Environment Research Council (NERC), and the Leverhume Trust.
I would like to especially thank all my family, Professor Harry Nicholson, Duncan Kitchen, Alan and Joyce Bellinger, the medics and nurses, and all those who gave unremitting support during a life-threatening illness. Finally, I would like to thank all my family for their support during my early education and throughout my career, I am especially grateful for this constant support.
This book is dedicated to my late father George Billings, who taught me without really teaching.
In this chapter a brief introduction to linear and nonlinear system identification will be provided. The descriptions are not meant to be detailed or comprehensive. Rather, the aim is to briefly describe the methods from a descriptive point of view so the reader can appreciate the broad development of the methods and the context in which they were introduced. Maths is largely avoided in this first chapter because detailed definitions and descriptions of the models, systems, and identification procedures will be given in the following chapters.
The main theme of the book – methods based around the NARMAX (nonlinear autoregressive moving average model with exogenous inputs) model and related methods – will also be introduced. In particular, the NARMAX philosophy for nonlinear system identification will be briefly described, again with full details given in later chapters, and how this leads into the important problems of frequency response functions for nonlinear systems and models of spatio-temporal systems will be briefly developed.
The concept of a mathematical model is fundamental in many branches of science and engineering. Virtually every system we can think of can be described by a mathematical model. Some diverse examples are illustrated in Figure 1.1. All the systems illustrated in Figure 1.1 can be described by a set of mathematical equations, and this is referred to as the mathematical model of the system. The examples included here show a coal-fired power station, an oil rig, an economic system represented by dealing screens in the stock exchange, a machine vision system (autonomous guided vehicle), a vibrating car, a bridge structure, and a biomedical system. Although each system is made up of quite different components, if each is considered as a system with inputs and outputs that are related by dynamic behaviours then they can all be described by a mathematical model. Surprisingly, all these systems can be represented by just a few basic mathematical operations – such as derivatives and integrals – combined in some appropriate manner with coefficients. The idea of the model is that it describes each system such that the model encodes information about the dynamics of the system. So, for example, a model of the power station would consist of a set of mathematical equations that describe the operation of pulverising the coal, burning it to produce steam, the turbo-alternator, and all the other components that make up this system. Mathematical models are at the heart of analysis, simulation, and design.
Figure 1.1 Examples of modelling, simulation, and control. Courtesy of dreamstime.com.
Assuming that accurate models of the systems can be built then computers can be programmed to simulate the models, to solve the mathematical equations that represent the system. In this way the computer is programmed to behave like the system. This has numerous advantages: different system designs can be assessed without the expense and delay of physically building the systems, experiments on the computer which would be dangerous on the real system (e.g., nuclear) can be simulated, and information about how the system would respond to different inputs can be acquired. Questions such as ‘how does the spacecraft behave if the re-entry angle is changed or one of the rockets fails?’, or ‘how would the economy respond to a cut in interest rates, would this increase/decrease inflation/unemployment?’, and so on, can all be posed and answered. Models therefore are central to the study of dynamical systems.
A mathematical model of a system can be used to emulate the system, predict the system response for given inputs, and investigate different design scenarios. However, these objectives can only be achieved if the model of the system is known. The validity of all the simulation, analysis, and design of the system is dependent on the model being an accurate representation of the system. The construction of accurate dynamic models is therefore fundamental to this type of analysis. So how are mathematical models of systems determined?
One way, called analytical modelling, involves breaking the system into component parts and applying the laws of physics and chemistry to each part to slowly build up a description. For example, a resistor can be described by Ohms law, mechanical systems by force and energy balance equations, and heat conduction systems by the laws of thermodynamics, and so on. This process can clearly be very complex, it is time-consuming and may take several man-years, it is problem-dependent, requires a great deal of expertise in many diverse areas of science, and would need to be repeated if any part of the system changed through redesign.
But, returning to the examples of the dynamic systems in Figure 1.1 suggests there is an alternative approach which overcomes most of these problems and which is generally applicable to all systems. Given the mathematical model and the input to a system, the system response can be computed; this is the simulation problem. All the systems in Figure 1.1 produce input and output signals, and if these can be measured it should be possible to work out what the system model must have been. This is the converse to the simulation problem – given measurements of the system inputs and outputs, determine what the mathematical model of the system should be. This is called ‘system identification’; it provides the link between systems and signals and is the unifying theme throughout this book. System identification therefore is just a means of measuring the mathematical model of a process.
System identification is a method of measuring the mathematical description of a system by processing the observed inputs and outputs of the system. System identification is the complement of the simulation problem. Surely the output signal contains buried within it the dynamics of the mathematical model that produced this signal from the measured input, so how can this information be extracted? System identification provides a principled solution to this problem. Even in ideal conditions this is not easy because the form that the model of the system takes will be unknown, is it linear or nonlinear, how many terms are in the model, what type of terms should be in the model, does the system have a time delay, what type of nonlinearity describes this system, etc.? Yet, if system identification is to be useful, these problems must be resolved. The advantages of system identification are many: it is applicable to all systems, it is often quick, and can be made to track changes in the system. These advantages all suggest that system identification will be a worthwhile study.
Linear systems are defined as systems that satisfy the superposition principle. Linear system identification can be broadly categorised into two approaches; nonparametric and parametric methods. Interest in linear system identification gathered significant momentum from the 1970s onwards, and many new and important results and algorithms were developed (Lee, 1964; Deutsch, 1965; Box and Jenkins, 1970; Himmelblau, 1970; Astrom and Eykoff, 1971; Graupe, 1972; Eykhoff, 1974; Nahi, 1976; Goodwin and Payne, 1977; Ljung and Södeström, 1983; Young, 1984; Norton, 1986; Ljung, 1987; Södeström and Stoica, 1989; Keesman, 2011). Nonparametric methods develop models based typically on the system impulse response or frequency response functions (Papoulis, 1965; Jenkins and Watts, 1968; Eykhoff, 1974; Pintelon and Schoukens, 2001; Bendat and Piersol, 2010). These are usually based on correlation methods and Fourier transforms, respectively, although there are many alternative methods. Special input signals were developed at this time, including multi-level sequences, of which the pseudo-random binary sequence was particularly important (Godfrey, 1993). Pseudo-random sequences could be easily designed and generated and were an ideal sequence to use in experiments on industrial plants to identify linear models. The sequences could be tailored to the process under investigation, so that the power of the input excitation was matched to the bandwidth of the process. This had the advantage that the noise-free signal output was maximised and hence the signal-to-noise ratio on the measured output was enhanced. Pseudo-random binary sequences were the best approximation to white noise and this led to important advantages when using cross-correlation to identify the models because if the input was correctly designed, so that the autocorrelation of the input was an impulse at the origin, the Wiener–Hopf equation (Jenkins and Watts, 1968; Priestley, 1981; Bendat and Piersol, 2010) which relates the cross-correlation between the input and output of a system to the convolution of the system impulse response and the autocorrelation function simplifies so that the cross-correlation becomes directly proportional to the system impulse response. This was a significant result, and the use and development of pseudo-random sequences continued for many years. The other advantage of using a designed input, not just a pseudo-random sequence, was that the input could be measured almost perfectly.
The introduction of the fast Fourier transform (FFT) in 1965 (Jenkins and Watts, 1968) meant that previously slow methods of computing the Fourier transform of a data sequence became much faster and efficient, with increases in speed of orders of magnitude. Linear system identification methods based on the cross and power spectral densities were further developed, following the introduction of the FFT, to provide estimates of the system frequency response. The advantages of these approaches, which replaced the convolution in time with the much simpler algebraic relationships in the Laplace and frequency domains, were offset by the need to window and smooth the spectral estimates to obtain good estimates (Jenkins and Watts, 1968; Bendat and Piersol, 2010). Coherency functions were used to detect poor estimates, and a catalogue of methods was developed based on the frequency response function estimates. This fed into developments in mechanical engineering based on modal analysis (Worden and Tomlinson, 2001), which became established as an important method of analysing and studying vibrations in all kinds of structures.
Parametric methods became popular from the 1970s onwards with an explosion of developments fuelled by the interest at that time in control systems and the development of methods of online process control, and adaptive control including self-tuning algorithms (Wellstead and Zarrop, 1991). These latter methods were all based on a model of the process that could be updated online. Least squares-based methods were developed and the effect of noise on the measurements was studied in depth, resulting in the introduction of algorithms including instrumental variables (Young, 1970), generalised least squares (Clarke, 1967), suboptimal least squares, extended least squares and maximum likelihood (Astrom and Eykhoff, 1971; Eykhoff, 1974). It was realised that data from almost every real system will involve inaccurate measurements and corruption of the signals by noise. It was shown that if the noise is correlated or coloured, biased estimates will be obtained and that even small amounts of correlated noise can result in significantly incorrect models (Astrom and Eykhoff, 1971; Eykhoff, 1974; Goodwin and Payne, 1977; Norton, 1986; Södeström and Stoica, 1989). All the algorithms above therefore were designed to either accommodate the noise or model it explicitly (Clarke, 1967; Young, 1970). Even the offline algorithms were therefore iterative, so that both a model of the process and a model of the noise were identified by operating on the data set several times over until the algorithm converged. Later, in the 1980s, prediction error methods were developed; many of the earlier parameter estimation algorithms were unified under the prediction error structure, and elegant proofs of convergence and analysis of the methods were developed (Ljung and Södeström, 1983; Norton, 1986; Ljung, 1987; Södeström and Stoica, 1989). The advantage of the prediction error methods was that they had almost the same asymptotic properties as the maximum likelihood algorithm but, while the probability density function of the residuals had to be known to apply maximum likelihood (which for linear systems could be taken as Gaussian), the prediction error methods optimised a cost function without any knowledge of the density functions (Ljung and Södeström, 1983; Ljung, 1987). This latter point became very important for the development of parameter estimation methods for nonlinear systems, where the signals will almost never be Gaussian and therefore the density functions will rarely be known.
Online or recursive algorithms were also actively developed from the 1970s onwards (Ljung and Södeström, 1983; Young, 1984; Norton, 1986). In contrast to the batch methods described above, where all the data is processed at once, in recursive methods the data is processed over a data window that is moved through the data set. This allows online tracking of slow time variation and is often the basis of adaptive, self-tuning, and many fault-detection algorithms.
The development of linear identification algorithms is still a very active and healthy research field, with many participants from all around the world. This has been encouraged by the ever-increasing need to develop models of systems and the simple fact that system identification is relatively straightforward; it works well most of the time, and can be applied to any system where data can be recorded.
Nonlinear systems are usually defined as any system which is not linear, that is any system that does not satisfy the superposition principle. This contrarian description is very vague but is often necessary because there are so many types of nonlinear systems that it is almost impossible to write down a description that covers all the classes that can exist under the title of ‘nonlinear dynamic system’. Authors therefore tend to focus on particular classes of nonlinear systems, which can be tightly defined, but which are limited. Historically, system identification for nonlinear systems has developed by focusing on specific classes of system and specific models. The early work was dominated by methods based on the Volterra series, which in the discrete time case can be expressed as
(1.1)
Because of the problems of identifying Volterra models, from the late 1970s onwards other model forms were investigated as a basis for system identification for nonlinear systems. Various forms of block-structured nonlinear models were introduced or reintroduced at this time (Billings and Fakhouri, 1978, 1982; Billings, 1980; Haber and Keviczky, 1999). The Hammerstein model consists of a static single-valued nonlinear element followed by a linear dynamic element. The Wiener model is the reverse of this combination, so that the linear element is before the static nonlinear characteristic. The General Model consists of a static linear element sandwiched between two dynamic systems. Other models, such as the Sm, Uryson, etc. models, represent alternative combinations of elements. All these models can be represented by a Volterra series, but in this case the Volterra kernels take on a special form in each case. Identification consists mainly of correlation-based methods, although some parameter estimation methods were also developed. The correlation methods exploited certain properties of these systems which meant that if specific inputs were used, often white Gaussian noise again, the individual elements could be identified one at a time. This resulted in manageable requirements of data and the individual blocks could sometimes be related to components in the system under study. Methods were developed, based on correlation and separable functions, which could determine which of the block-structured models was appropriate to represent a system (Billings and Fakhouri, 1978, 1982). Many results were introduced and these systems continue to be studied in depth. The problem of course is that these methods are only applicable to a very special form of model in each case and cannot therefore be considered as generic. They make too many assumptions about the form of the model to be fitted, and if little is known about the underlying system then applying a method that assumes a very special model form may not work well. All the above are essentially nonparametric methods of identification for nonlinear systems.
The NARMAX model was introduced in 1981 as a new representation for a wide class of nonlinear systems (Billings and Leontaritis, 1981; Leontaritis and Billings, 1985; Chen and Billings, 1989). The NARMAX model is defined as
(1.2)
(1.3)
expands the system response in terms of past inputs only. The IIR filter
(1.4)
While NARMAX started as the name of a model, it has now developed into a philosophy of nonlinear system identification (Billings and Tsang, 1989; Billings and Chen, 1992). The NARMAX approach consists of several steps:
Structure detection
Which terms are in the model?
Parameter estimation
What are the model coefficients?
Model validation
Is the model unbiased and correct?
Prediction
What is the output at some future time?
Analysis
What are the dynamical properties of the system?
Structure detection forms the most fundamental part of NARMAX. In linear parameter estimation it is relatively easy to determine the model order. Often models of order one, two, three, and so on are estimated and this is quick and efficient. The models are then validated and compared to find which is the simplest model that can adequately represent the system. This process works well because, assuming a pulse transfer function representation, every increase in model order only increases the number of unknown parameters by two – one extra coefficient for the numerator and the denominator. Over-fitted models are easily detected by pole zero cancellations and other methods.
But this naïve approach does not easily carry over to the nonlinear case. For example, a NARMAX model which consists of one lagged input and one lagged output term, three lagged noise terms, expanded as a cubic polynomial, would consist of 56 possible candidate terms. This number of candidate terms arises because the expansion by definition includes all possible combinations within the cubic expansion. Naïvely proceeding to estimate a model which includes all these terms and then pruning will cause numerical and computational problems and should always be avoided. However, often only a few terms are important in the model. Structure detection, which aims to select terms one at a time, is therefore critically important. This makes sense from an intuitive perspective – build the model by putting in the most important or significant term first, then the next most significant term, and so on, and stop when the model is adequate, it is numerically efficient and sound, and most important of all leads to simple parsimonious models that can be related to the underlying system.
These objectives can easily be achieved by using the orthogonal least squares (OLS) algorithm and its derivatives to select the NARMAX model terms one at a time (Korenberg et al., 1988; Billings et al., 1989; Billings and Chen, 1998). This approach can be adopted for many different model forms and expansions, and is described in Chapter 3.
These ideas can also be adapted for pattern recognition and feature selection with the advantage that the features are revealed as basis functions that are easily related back to the original problem (Wei and Billings, 2007). The basis vectors are not potentially functions of all the initial features as is the case in principal component analysis, which then destroys easy interpretation of the results.
The philosophy of NARMAX therefore relates to finding the model structure or fitting the simplest model so that the underlying rule is elucidated. Building up the model, term by term, has many benefits not least because if the underlying system is linear, NARMAX methods should just fit a linear model and stop when this model is a good representation of the system. It would be completely wrong to fit a nonlinear model to represent a linear system. For example, the stability of linear systems is well known and is applicable for any input. This does not apply to nonlinear systems. Over-fitting nonlinear systems, by using either excessive time lags or excessive nonlinear function approximations, not only induces numerical problems but can also introduce additional unwanted dynamic behaviours and disguises rather than reveals the relationships that describe the system.
The fundamental concept of structure detection, that is core to NARMAX methods, naturally leads into a discussion of what system identification is for. Very broadly, this can be divided into two aims.
The first involves approximation, where the key aim is to develop a model that approximates the data set such that good predictions can be made. There are many applications where this approach is appropriate, for example in time series prediction of the weather, stock prices, speech, target tracking, pattern classification, etc. In such applications the form of the model is not that important. The objective is to find an approximation scheme which produces the minimum prediction errors. Fuzzy logic, neural networks, and derivatives of these including Bayesian methods naturally solve these types of problems easily and well (Miller et al., 1990; Chen and Billings, 1992; Bishop, 1995; Haykin, 1999; Liu, 2001; Nelles, 2001). The approximation properties of these approaches are usually quoted based on the Weierstrass theorem, which of course equally applies to many other model forms. Naturally, users of these methods focus on the mean-squared-error properties of the fitted model, perhaps over estimation and test sets.
A second objective of system identification, which includes the first objective as a subset, involves much more than just finding a model to achieve the best mean-squared errors. This second aim is why the NARMAX philosophy was developed and is linked to the idea of finding the simplest model structure. The aim here is to develop models that reproduce the dynamic characteristics of the underlying system, to find the simplest possible model, and if possible to relate this to components and behaviours of the system under study. Science and engineering are about understanding systems, breaking complex behaviours down into simpler behaviours that can be understood, manipulated, and exploited. The core aim of this second approach to identification is therefore, wherever possible, to identify, reveal, and analyse the rule that represents the system. So, if the system can be represented by a simple first-order dynamic system with a cubic nonlinear term in the input this should be revealed by the system identification. Take, for example, two different oil rigs, which are similar but of a different size and operate in different ocean depths and sea states. If the underlying hydrodynamic characteristics which describe the action of the waves on the platform legs and the surge of the platform follow the same scientific law, then the identified models should reveal this (Worden et al., 1994; Swain et al., 1998). That is, we would expect the core model characteristics to be the same even though the parameter values could be different. Therefore, a very important aim is to find the rule so that this can be analysed and understood. Gross approximation to the data is not sufficient in these cases, finding the best model structure is. Ideally, we want to be able to write the identified model down and to relate the terms and characteristics of the model to the system. These aims relate to the understanding of systems, breaking complex behaviours down into simpler behaviours that can be simulated, analysed, and understood. These objectives are relevant to model simulation and control systems design, but increasingly to applications in medicine, neuroscience, and the life sciences. Here the aim is to identify models, often nonlinear, that can be used to understand the basic mechanisms of how these systems operate and behave so that we can manipulate and utilise them.
These arguments also carry over to the requirement to fit models of the system and of the noise. Noise models are important to ensure that the estimated model of the system is unbiased and not just a model of one data set, but noise models are also highly informative. Noise models reveal what is unpredictable from the input, and they indicate the level and confidence that can be placed in any prediction or simulation of the system output.
NARMAX started off as the name of a model class but has now become a generic term for identification methods that aim to model systems in the simplest possible way. Model validation is a critical part of NARMAX modelling and goes far beyond just comparing mean-squared errors. One of the basic approaches involves testing whether there is anything predictable left in the residuals (Billings and Voon, 1986; Billings and Zhu, 1995). The aim is to find the simplest possible model that satisfies this condition. The idea is that if the models of the system and of the noise are adequate, then all the information in the data set should be captured in the model, and the remainder – the final residuals – should be unpredictable from all past inputs and outputs. This is statistical validation and can be applied to any model form and any fitting algorithm. Qualitative validation is also used to develop NARMAX estimation procedures that reproduce the dynamic invariants of the systems. Models that are developed based on term selection to obtain the simplest possible model have been shown to reproduce attractors and dynamic invariants that are topologically closer to the properties of the underlying system dynamics than over-fitted models (Aguirre and Billings, 1995a, b). This links back to the desire to be able to relate the models to the underlying system and to use the models to understand basic behaviours and processes not just to approximate a data set.
NARMAX modelling is a process that can involve feedback in the model-fitting process. As an example, if the initial library of terms that are used to search for the correct model terms is not large enough, then the algorithms will be unable to find the appropriate model. But, applying model validation methods should reveal that terms are missing from the model, and in some instances can suggest what type of terms are missing. The estimation process can then be restarted by including a wider range or different types of model terms. Only when the structure detection and all the validation procedures are satisfied is the model accepted as a good representation of the system. Just using mean-squared errors is often uninformative and can lead to fitting to the noise, and in the worse case models that are little more than lookup tables.
In the analysis of linear systems a combined time and frequency domain analysis is ubiquitous. Frequency domain methods are core in control system design, vibrations, acoustics, communications, and in almost every branch of science. However, an inspection of the nonlinear system identification literature over the last 20 years or so shows that mainly time domain methods have been developed. Neural networks, fuzzy logic, Bayesian algorithms are all based solely in the time domain and no information about frequency response is supplied. Linear methods would suggest that this is a gross oversight and NARMAX methods have been developed in both time and frequency.
Early methods of computing the generalised frequency response functions (GFRFs) – these are generalisations of the linear frequency response function – were based on the Fourier transform of the Volterra series and hence suffered from all the disadvantages including the need for very long data sets, unrealistic assumptions about the systems, and specialised inputs. However, all these problems can be avoided by mapping identified NARMAX models directly into the GFRFs (Billings and Tsang, 1989; Peyton-Jones and Billings, 1989). This means that the GFRFs can be written down and, importantly, that the effects in frequency can be related back to specific time domain model terms and vice versa. This links back to the importance of finding the simplest model structure and relating that model and its properties to the underlying system characteristics. The linear case can be used to illustrate this point. For linear systems we might identify a state-space model, a weighting or impulse sequence, a pulse transfer function, or several other model forms. When the system is linear all these models are related and any one can readily be transformed into another. If each of these different model forms were identified for a particular system, if the models are unbiased and correct, they should all have exactly the same frequency response. In addition, just looking at time domain behaviours does not always reveal invariant characteristics which are so important in the scientific understanding of basic behaviours in any system. So, even if a correct linear model has been identified, obviously simulating this model with different inputs (maybe a random input and a swept sine) does not easily reveal properties of that system by visual inspection. But if the system is of second order, the frequency response in every case should show one resonance; this can be related to specific terms in the system model and hence back to the system under study, and shows a core invariant system behaviour.
The same argument holds for nonlinear dynamic systems but now the story is more complex. First, many different types of models could be fitted to a data set from a nonlinear system – Volterra, NARMAX, nonlinear state-space, neural networks, etc. But it is often virtually impossible to map from one model to another and, as in the linear case, just looking at properties in the time domain only reveals half the picture. This is why we map NARMAX models to the GFRFs, because this reveals core invariant behaviours that can usually be interpreted in a very revealing manner. Because this is a mapping, each GFRF can be generated one at a time and even if there are a large number it is easy to evaluate which are important and when to stop. Core frequency response behaviours, which are essentially extensions of the concept of resonance, can then be identified and related back to the behaviour and properties of the underlying system. This process is relatively easy even for complex systems, has been extended to severely nonlinear systems with sub-harmonics and, while the potentially large number of GFRFs may at first appear to be a problem, this can be turned around and used as a great benefit. For example, in the design of a totally new class of filters called energy transfer filters. Frequency domain analysis is therefore core to the NARMAX philosophy and is discussed in Chapters 6 and 7.
The vast majority of system identification methods, certainly for nonlinear systems, are based on discrete time models. This is natural because data collection inevitably involves data sampling, so that the discrete domain is the natural choice. But there are situations where a continuous-time model would be preferable. Continuous-time models are often simpler in structure than the discrete counterpart. For example, a second-order derivative term in continuous time would involve at least three and often more, depending on the approximation scheme, terms in discrete time. Continuous-time models are also independent of the sample rate.
The established literature on most systems and processes is almost always based on continuous-time integro-differential equations. So that, if the identification involves a study of a system that has been analysed before using different modelling approaches such as analytical modelling using the basic laws of science, then an identified continuous-time model can more easily be compared to previous models. In the modelling of the magnetosphere and space weather (see the case studies in Chapter 14 for a specific example), there is a considerable body of analytical modelling work developed by physicists over many years. If nonlinear continuous-time models can be identified then these can be compared to the previous work and indeed the analytical models can be used to prime the model structure selection (Balikhin et al., 2001). Model validation can also be used to validate existing physically derived models and NARMAX methods can be used to find missing model terms and to analyse these models in the frequency domain. This is why we both study the estimation of the structure – that is, what model terms to include – and estimate the parameters in complex nonlinear differential equation models. NARMAX methods can be extended to solve these problems, often without the need to differentiate data which always increases noise considerably.
Severely nonlinear systems that exhibit sub-harmonics are also studied. These results are developed following the philosophy of finding the simplest possible model and because sub-harmonics is a frequency domain behaviour, developing algorithms that allow the user to see the properties in the frequency domain is important (Li and Billings, 2005). These algorithms allow NARMAX to be applied to model very exotic and complex dynamic behaviours.
Time-varying systems have been extensively studied based on classical LMS, recursive least squares, and Kalman filter-based algorithms. But most of the existing methods only work for slow time variation. However, by using a new wavelet expansion-based approach, NARMAX algorithms have been developed to track rapid time changes and movements and to map these to the frequency domain where invariant characteristics can be tracked – for EEG analysis, for example. These problems are discussed in detail in Chapters 9, 10, and 11.
Spatio-temporal systems are systems that evolve over both space and time (Hoyle, 2006). Purely temporal systems involve measurements of a variable over time. There are also examples where measurements at one spatial location, for example an electrophysiological probe in the brain, or a flow monitor in a river, also produce a temporal signal. But both these examples are strictly spatio-temporal systems. That is, the dynamics at each spatial location may depend, in a nonlinear dynamic way, both on what happened back in time and what happened at other spatial locations back in time. There are many applications of such systems, for example the dynamics of cells in a dish, the growth of crystals, neuro-images, etc. These are a very important and neglected class of systems, and hence NARMAX methods have been developed to identify several different model classes which can be used to represent spatio-temporal behaviours including cellular automata, coupled map lattices, and nonlinear partial differential equations.
The concept of model structure is even more important for spatio-temporal systems because a model of a system may involve just a few lagged time terms at a few, possibly nonadjacent spatial locations. Grossly approximating the system would therefore be inappropriate, and again the key challenge is to find the model structure which now involves finding the neighbourhood that defines the spatial interactions and the temporal lags. Invariant behaviours are also important in spatio-temporal systems, simply because a model excited with different inputs will produce different patterns that evolve over time. Depending on the choice of inputs, the patterns produced from an identical model could be significantly different when inspected visually. Comparing different models and different patterns to discover the rules of the underlying behaviours is therefore very difficult. That is why the GFRFs for NARMAX models have recently been introduced for spatio-temporal NARMAX models. These problems are discussed in detail in Chapters 12 and 13.
While there is a considerable literature on algorithms for nonlinear system identification of all sorts of shapes and forms, there are a relatively small number of users who are expert at applying these methods to real-life systems. Most authors just use simulated examples to illustrate and test their algorithms. Linear parameter estimation and NARMAX models can be studied and thoroughly tested by simulating known models and comparing the initial simulated model coefficients to those identified. This provides a powerful means of evaluating the methods. Neural networks, which are designed to purely approximate systems, produce models that usually contain so many weights or parameters and basic approximating units that the model representation cannot be written down, and maybe conveniently therefore cannot be tested to check the training procedures do indeed identify the exact same model that was used as a simulated test to begin with.
This is why the overall aim of this book is to try to introduce and show the reader how to apply NARMAX methods to real problems. The emphasis therefore is on describing the methods in a way that is as transparent as possible, deliberately leaving out all the variants of the methods and their complex derivations and properties, all of which are available in the literature.
Hence, in Chapter 14, practical aspects of nonlinear system identification and many case studies are described. The case studies are deliberately taken from a wide range of systems that we have analysed over recent years and range from modelling space weather systems, through to the identification of the visual system of a fruit fly, to the modelling of iceberg flux in Greenland, and many other systems. All the case studies are for real problems where the main objective is to use system identification as a tool to understand the complex system being studied in a way that is revealing, transparent, and as simple as possible.
Aguirre, L.A. and Billings, S.A. (1995a) Dynamical effects of over-parameterisation in nonlinear models. Physica D, 80, 26–40.
Aguirre, L.A. and Billings, S.A. (1995b) Retrieving dynamical invariants from chaotic data using NARMAX models. International Journal of Bifurcation and Chaos, 5, 449–474.
Astrom, K.J. and Eykhoff, P. (1971) System identification—a survey. Automatica, 7, 123–162.
Balikhin, M., Boaghe, O.M., Billings, S.A., and Alleyne, H. (2001) Terrestrial magnetosphere as a nonlinear dynamical resonator. Geophysical Research Letters, 28, 1123–1126.
Bendat, J.S. and Piersol, A.G. (2010) Random Data Analysis and Measurement Procedures, 4th edn. New York: John Wiley & Sons.
Billings, S.A. (1980) Identification of nonlinear systems: a survey. IEE Proceedings, Pt. D, 127(6), 272–285.
Billings, S.A. and Chen, S. (1992) Neural networks and system identification. In K. Warwick, G.W. Irwin and K.J. Hunt (eds), Neural Networks for Systems and Control. London: Peter Peregrinus Ltd, on behalf of IEE, pp. 181–205.
Billings, S.A. and Chen, S. (1998) The determination of multivariable nonlinear models for dynamic systems using neural networks. In C.T. Leondes (ed.), Neural Network System Techniques and Applications. San Diego, CA: Academic Press, pp. 231–278.
Billings, S.A. and Fakhouri, S.Y. (1978) Identification of a class of nonlinear systems using correlation analysis. IEE Proceedings, Pt. D, 125, 691–697.
Billings, S.A. and Fakhouri, S.Y. (1982) Identification of systems containing linear dynamic and static nonlinear elements. Automatica, 18(1), 15–26.
Billings, S.A. and Leontaritis, I.J. (1981) Identification of nonlinear systems using parametric estimation techniques. Proceedings of the IEE Conference on Control and its Application, Warwick, UK, pp. 183–187.
Billings, S.A. and Tsang, K.M. (1989) Spectral analysis for nonlinear systems—Part I: Parametric nonlinear spectral analysis. Mechanical Systems and Signal Processing, 3(4), 319–339.
Billings, S.A. and Voon, W.S.F. (1986) Correlation based model validity tests for non-linear models.
