Kernel Adaptive Filtering - Weifeng Liu - E-Book

Kernel Adaptive Filtering E-Book

Weifeng Liu

0,0
107,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Online learning from a signal processing perspective

There is increased interest in kernel learning algorithms in neural networks and a growing need for nonlinear adaptive algorithms in advanced signal processing, communications, and controls. Kernel Adaptive Filtering is the first book to present a comprehensive, unifying introduction to online learning algorithms in reproducing kernel Hilbert spaces. Based on research being conducted in the Computational Neuro-Engineering Laboratory at the University of Florida and in the Cognitive Systems Laboratory at McMaster University, Ontario, Canada, this unique resource elevates the adaptive filtering theory to a new level, presenting a new design methodology of nonlinear adaptive filters.

  • Covers the kernel least mean squares algorithm, kernel affine projection algorithms, the kernel recursive least squares algorithm, the theory of Gaussian process regression, and the extended kernel recursive least squares algorithm

  • Presents a powerful model-selection method called maximum marginal likelihood

  • Addresses the principal bottleneck of kernel adaptive filters—their growing structure

  • Features twelve computer-oriented experiments to reinforce the concepts, with MATLAB codes downloadable from the authors' Web site

  • Concludes each chapter with a summary of the state of the art and potential future directions for original research

Kernel Adaptive Filtering is ideal for engineers, computer scientists, and graduate students interested in nonlinear adaptive systems for online applications (applications where the data stream arrives one sample at a time and incremental optimal solutions are desirable). It is also a useful guide for those who look for nonlinear adaptive filtering methodologies to solve practical problems.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 296

Veröffentlichungsjahr: 2011

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



CONTENTS

PREFACE

ACKNOWLEDGMENTS

NOTATION

ABBREVIATIONS AND SYMBOLS

1: BACKGROUND AND PREVIEW

1.1 SUPERVISED, SEQUENTIAL, AND ACTIVE LEARNING

1.2 LINEAR ADAPTIVE FILTERS

1.3 NONLINEAR ADAPTIVE FILTERS

1.4 REPRODUCING KERNEL HILBERT SPACES

1.5 KERNEL ADAPTIVE FILTERS

1.6 SUMMARIZING REMARKS

2: KERNEL LEAST-MEAN-SQUARE ALGORITHM

2.1 LEAST-MEAN-SQUARE ALGORITHM

2.2 KERNEL LEAST-MEAN-SQUARE ALGORITHM

2.3 KERNEL AND PARAMETER SELECTION

2.4 STEP-SIZE PARAMETER

2.5 NOVELTY CRITERION

2.6 SELF-REGULARIZATION PROPERTY OF KLMS

2.7 LEAKY KERNEL LEAST-MEAN-SQUARE ALGORITHM

2.8 NORMALIZED KERNEL LEAST-MEAN-SQUARE ALGORITHM

2.9 KERNEL ADALINE

2.10 RESOURCE ALLOCATING NETWORKS

2.11 COMPUTER EXPERIMENTS

2.12 CONCLUSION

3: KERNEL AFFINE PROJECTION ALGORITHMS

3.1 AFFINE PROJECTION ALGORITHMS

3.2 KERNEL AFFINE PROJECTION ALGORITHMS

3.3 ERROR REUSING

3.4 SLIDING WINDOW GRAM MATRIX INVERSION

3.5 TAXONOMY FOR RELATED ALGORITHMS

3.6 COMPUTER EXPERIMENTS

3.7 CONCLUSION

4: KERNEL RECURSIVE LEASTSQUARES ALGORITHM

4.1 RECURSIVE LEAST-SQUARES ALGORITHM

4.2 EXPONENTIALLY WEIGHTED RECURSIVE LEAST-SQUARES ALGORITHM

4.3 KERNEL RECURSIVE LEAST-SQUARES ALGORITHM

4.4 APPROXIMATE LINEAR DEPENDENCY

4.5 EXPONENTIALLY WEIGHTED KERNEL RECURSIVE LEAST-SQUARES ALGORITHM

4.6 GAUSSIAN PROCESSES FOR LINEAR REGRESSION

4.7 GAUSSIAN PROCESSES FOR NONLINEAR REGRESSION

4.8 BAYESIAN MODEL SELECTION

4.9 COMPUTER EXPERIMENTS

4.10 CONCLUSION

5: EXTENDED KERNEL RECURSIVE LEAST-SQUARES ALGORITHM

5.1 EXTENDED RECURSIVE LEAST-SQUARES ALGORITHM

5.2 EXPONENTIALLY WEIGHTED EXTENDED RECURSIVE LEAST-SQUARES ALGORITHM

5.3 EXTENDED KERNEL RECURSIVE LEAST-SQUARES ALGORITHM

5.4 EX-KRLS FOR TRACKING MODELS

5.5 EX-KRLS WITH FINITE RANK ASSUMPTION

5.6 COMPUTER EXPERIMENTS

5.7 CONCLUSION

6: DESIGNING SPARSE KERNEL ADAPTIVE FILTERS

6.1 DEFINITION OF SURPRISE

6.2 A REVIEW OF GAUSSIAN PROCESS REGRESSION

6.3 COMPUTING SURPRISE

6.4 KERNEL RECURSIVE LEAST SQUARES WITH SURPRISE CRITERION (KRLS-SC)

6.5 KERNEL LEAST MEAN SQUARE WITH SURPRISE CRITERION (KLMS-SC)

6.6 KERNEL AFFINE PROJECTION ALGORITHMS WITH SURPRISE CRITERION (KAPA-SC)

6.7 COMPUTER EXPERIMENTS

6.8 CONCLUSION

EPILOGUE

APPENDIX A: MATHEMATICAL BACKGROUND

A.1 SINGULAR VALUE DECOMPOSITION

A.2 POSITIVE-DEFINITE MATRIX

A.3 EIGENVALUE DECOMPOSITION

A.4 SCHUR COMPLEMENT

A.5 BLOCK MATRIX INVERSE

A.6 MATRIX INVERSION LEMMA

A.7 JOINT, MARGINAL, AND CONDITIONAL PROBABILITY

A.8 NORMAL DISTRIBUTION

A.9 GRADIENT DESCENT

A.10 NEWTON'S METHOD

APPENDIX B: APPROXIMATE LINEAR DEPENDENCY AND SYSTEM STABILITY

REFERENCES

ADAPTIVE AND LEARNING SYSTEMS FOR SIGNAL PROCESSING, COMMUNICATION, AND CONTROL

INDEX

Copyright © 2010 by John Wiley & Sons, Inc. All rights reserved

Published by John Wiley & Sons, Inc., Hoboken, New Jersey

Published simultaneously in Canada

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.

Library of Congress Cataloging-in-Publication Data:

Liu, Weifeng

Kernel adaptive filtering: a comprehensive introduction/Jose C. Principle, Weifeng Liu,

Simon Haykin.

p. cm.

Includes bibliographical references and index.

ISBN 978-0-470-44753-6

1. Adaptive filters. 2. Kernel functions. I. Principe, José C. II. Haykin, Simon S., 1931–III. Title.

TK7872.F5P745 2010

621.382'23-dc22

2009042654

To our families

PREFACE

For the first time, this book presents a comprehensive and unifying introduction to kernel adaptive filtering. Adaptive signal processing theory has been built on three pillars: the linear model, the mean square cost, and the adaptive least-square learning algorithm. When nonlinear models are required, the simplicity of linear adaptive filters evaporates and a designer has to deal with function approximation, neural networks, local minima, regularization, and so on. Is this the only way to go beyond the linear solution? Perhaps there is an alternative, which is the focus of this book. The basic concept is to perform adaptive filtering in a linear space that is related nonlinearly to the original input space. If this is possible, then all three pillars and our intuition about linear models can still be of use, and we end up implementing nonlinear filters in the input space.

This book will draw on the theory of reproducing kernel Hilbert spaces (RKHS) to implement the nonlinear transformation of the input to a high-dimensional feature space induced by a positive-definite function called reproducing kernel. If the filtering and adaptation operations to be performed in RKHS can be expressed by inner products of projected samples, then they can be directly calculated by kernel evaluations in the input space. We use this approach to introduce a family of adaptive filtering algorithms in RKHS:

The kernel least-mean-square algorithmThe kernel affine projection algorithmsThe kernel recursive least - squares algorithmThe extended kernel recursive least - squares algorithm

These kernel-learning algorithms bridge closely two important areas of adaptive filtering and neural networks, and they embody beautifully two important methodologies of error-correction learning and memory-based learning. The bottlenecks of the RKHS approach to nonlinear filter design are the need for regularization, the need to select the kernel function, and the need to curtail the growth of the filter structure. This book will present in a mathematically rigorous manner the issues and the solutions to all these problems, and it will illustrate with examples the performance gains of kernel adaptive filtering.

Chapter 1 starts with an introduction to general concepts in machine learning, linear adaptive filters, and conventional nonlinear methods. Then, the theory of reproducing kernel Hilbert spaces is presented as the mathematical foundation of kernel adaptive filters. We stress that kernel adaptive filters are universal function approximators, have no local minima during adaptation, and require reasonable computational resources.

Chapter 2 studies the kernel least-mean-square algorithm, which is the simplest among the family of kernel adaptive filters. We develop the algorithm in a step-by-step manner and delve into all the practical aspects of selecting the kernel function, picking the step-size parameter, sparsification, and regularization. Two computer experiments, one with Mackey-Glass chaotic time-series prediction and the other with nonlinear channel equalization, are presented.

Chapter 3 covers the kernel affine projection algorithms, which is a family of four similar algorithms. The mathematical equations of filtering and adaptation are thoroughly derived from first principles, and useful implementation techniques are discussed fully. Many well-known methods can be derived as special cases of the kernel affine projection algorithms. Three detailed applications are included to show their wide applicability and design flexibility.

Chapter 4 presents the kernel recursive least-squares algorithm and the theory of Gaussian process regression. A sparsification approach called approximate linear dependency is discussed. And with the aid of the Bayesian interpretation, we also present a powerful model selection method called "maximum marginal likelihood". Two computer experiments are conducted to study the performance of different sparsification schemes and the effectiveness of maximum marginal likelihood to determine the kernel parameters.

Chapter 5 discusses the extended kernel recursive least-squares algorithm on the basis of the kernel recursive least-squares algorithm. We study systematically the problem of estimating the state of a linear dynamic system in RKHS from a sequence of noisy observations. Several important theorems are presented with proofs to outline the significance and basic approaches. This chapter contains two examples, Rayleigh channel tracking and Lorenz time - series modeling.

Chapter 6 is devoted to addressing the principal bottleneck of kernel adaptive filters, i.e., their growing structure. We introduce a subjective information measure called surprise and present a unifying sparsification scheme to curtail the growth effectively of kernel adaptive filters. Three interesting computer simulations are presented to illustrate the theories.

This book should appeal to engineers, computer scientists, and graduate students who are interested in adaptive filtering, neural networks, and kernel methods. A total of 12 computer-oriented experiments are distributed throughout the book that have been designed to reinforce the concepts discussed in the chapters. The computer experiments are listed in Table 1. Their MATLAB® implementations can be downloaded directly from the website http://www.cnel.ufl.edu/~weifeng/publication.htm. To keep the codes readable, we placed simplicity over performance during design and implementation. These programs are provided without any additional guarantees.

Table 1. A listing of all computer experiments in the book. MATLAB® programs that generate the results can be downloaded by all readers from the book's website http://www.cnel.ufl.edu/~weifeng/publication.htm.

Computer experimentTopic2.1KLMS Applied to Mackey-Glass Time-Series Prediction2.2KLMS Applied to Nonlinear Channel Equalization3.1KAPA Applied to Mackey-Glass Time-Series Prediction3.2KAPA Applied to Noise Cancellation3.3KAPA Applied to Nonlinear Channel Equalization4.1KRLS Applied to Mackey-Glass Time-Series Prediction4.2Model Selection by Maximum Marginal Likelihood5.1EX-KRLS Applied to Rayleigh Channel Tracking5.2EX-KRLS Applied to Lorenz Time-Series Prediction6.1Surprise Criterion Applied to Nonlinear Regression6.2Surprise Criterion Applied to Mackey-Glass Time-Series Prediction6.3Surprise Criterion Applied to CO2 Concentration Forecasting

We have strived to reflect fully the latest advances of this emerging area in the book. Each chapter concludes with a summary of the state of the art and potential future directions for research. This book should be a useful guide to both those who look for nonlinear adaptive filtering methodologies to solve practical problems and those who seek inspiring research ideas in related areas.

ACKNOWLEDGMENTS

We would like to start by thanking Dr. Puskal P. Pokharel, Seagate Technology; Dr. Murali Rao, University of Florida; Dr. Jay Gopalakrishnan, University of Florida; and Il Park, University of Florida, for their help in the development of the kernel adaptive filtering theory. We are most grateful to Dr. Aysegul Gunduz, Albany Medical College, Wadsworth Center; Dr. John Harris, University of Florida; and Dr. Steven Van Vaerenbergh, University of Cantabria, Spain, for providing many useful comments and constructive feedback on an early version of the manuscript of the book.

Many others have been kind enough to read critically selected chapters/sections of the book; in alphabetical order, they are as follows:

Erion Hasanbelliu, University of Florida, Gainesville, Florida

Dr. Kyu-hwa Jeong, Intel Corporation, Santa Clara, California

Dr. Ruijiang Li, University of California, San Diego, California

Dr. Antonio Paiva, University of Utah, Salt Lake City, Utah

Alexander Singh, University of Florida, Gainesville, Florida

Dr. Yiwen Wang, Hong Kong University of Science and Technology, Hong Kong

Dr. Jianwu Xu, University of Chicago, Chicago, Illinois

We also wish to thank (in alphabetical order): Dr. Peter Bartlett, University of California, Berkeley; Dr. Andrzej Cichocki, Riken, Brain Science Institute, Japan; Dr. Corinna Cortes, Google Lab; Dr. Graham C. Goodwin, University of Newcastle, UK; Dr. Michael Jordan, University of California, Berkeley; Dr. Thomas Kailath, Stanford University; Dr. Joel S. Kvitky, Rand Corporation; Dr. Yann LeCun, New York University; Dr. Derong Liu, University of Illinois at Chicago; Dr. David J. C. MacKay, University of Cambridge, UK; Dr. Tomaso Poggio, Massachusetts Institute of Technology; Dr. Ali H. Sayed, University of California, Los Angeles; Dr. Bernhard Schölkopf, Max Planck Institute for Biological Cybernetics, Germany; Dr. Sergios Theodoridis, University of Athens, Greece; Dr. Yoram Singer, Google Lab; Dr. Alexander J. Smola, Yahoo! research; Dr. Johan Suykens, Katholieke Universiteit Leuven, Belgium; Dr. Paul Werbos, The National Science Foundation; Dr. Bernard Widrow, Stanford University; and Dr. Chris Williams, University of Edinburgh, UK.

We thank the staff at Wiley, publisher George Telecki, editorial assistant Lucy Hitz, and production editor Kris Parrish, as well as the project manager, Stephanie Sakson from Toppan Best-Set Premedia, for their full support and encouragement in preparing the manuscript of the book and for all their behind-the-scenes effort in the selection of the book cover and the production of the book.

Last, but by no means least, we are grateful to Lola Brooks, McMaster University, for typing many sections of the manuscript.

NOTATION

The book discusses many algorithms involving various mathematical equations. A convenient and uniform notation is a necessity to convey clearly the basic ideas of the kernel adaptive filtering theory. We think it is helpful to summarize and explain at the beginning of the text our notational guidelines for ease of reference.

There are mainly three types of variables we need to distinguish:

scalar, vector, and matrix variables

The following is a list of the notational conventions used in the book:

1. We use small italic letters to denote scalar variables. For example, the output of a filter is a scalar variable, which is denoted by y.

2. We use CAPITAL ITALIC letters to denote SCALAR CONSTANTS. For example, the order of a filter is a scalar constant, which is denoted by L.

3. We use small bold letters for vectors.

4. We use CAPITAL BOLD letters to denote MATRICES.

5. We use parentheses to denote the time dependency of any variables (either scalar, vector, or matrix). For example, d(i) means the value of a scalar d at time (or iteration) i. u(i) means the value of a vector u at time (or iteration) i. Similarly G(i) means the value of a matrix G at time (or iteration) i. There is no rule without an exception. fi is used to denote the estimate of an input-output mapping f at time (or iteration) i since parenthesis is preserved for input argument like fi(u).

6. We use the superscript T to denote transposition. For example, if

then

7. All variables in our presentation are real. We do not discuss complex numbers in this book.

8. All vectors in our presentation are column vectors without exception.

9. We use subscript indices to denote 1) a component of a vector (or a matrix), 2) a general vector that the index is not related to time (or iteration). For example, ci could mean the ith vector in some set or the ith component of the vector c according to the context.

We have made every effort to make the notation consistent and coherent for the benefit of the reader. The following Table 2 summarizes and lists some typical examples.

Table 2. Notation.

 DescriptionExamplesScalarsSmall italic lettersdVectorsSmall bold lettersw, ω, ciMatricesCapital BOLD lettersU, φTime or iterationIndices in parenthesesu(i), d(i)Component of vectors/matricesSubscript indicesaj, Gi,jLinear spacesCapital mathbb lettersScalar constantsCapital ITALIC lettersL, N

ABBREVIATIONS AND SYMBOLS

We collect here a list of the main abbreviations and symbols used throughout the text for ease of reference.

(·)T

vector or matrix transposition

A-1

inverse of matrix A

E[·]

expected value of a random variable

m(·)

the mean of a random variance

σ2(·)

the variance of a random variable

<·,·>

inner product

||·||

norm of a vector; square root of the inner product with itself

|·|

absolute value of a real number or determinant of a matrix

proportional to

~

distributed according to

gradient

0

zero vector or matrix

β

forgetting factor

C(i)

dictionary or center set at iteration i

d(i)

desired output at time or iteration i (a real scalar)

diag{a, b}

a diagonal matrix with diagonal entries a and b

δ1

distance threshold in novelty criterion

δ2

prediction error threshold in novelty criterion

δ3

threshold in approximate linear dependency test

δij

Kronecker delta

Δw(i)

weight adjustment at time or iteration i (a column vector in an Euclidean space)

D

data set

e(i)

output estimation error at time or iteration i

feature space induced by the kernel mapping

G

Gram matrix of (transformed) input data

reproducing kernel Hilbert space

I

identity matrix

J(i)

error cost at time or iteration i

k(i)

Kalman gain (or gain vector) at time or iteration i

K(A)

condition number of a matrix A

k(u, u')

kernel (or covariance) function evaluated at u and u'

L

dimensionality of the input space

λ

regularization parameter

M

dimensionality of the feature space

M

misadjustment of the least - mean - square algorithm

n(i)

additive noise in the state space at time or iteration i

N

number of training data

η

step-size parameter

O(·)

of the order of a number

P

state-error correlation matrix

φ(·)

a mapping induced by a reproducing kernel

φ(í)

transformed filter input at time or iteration i (a column vector in a feature space)

φ

transformed input data matrix

R

covariance matrix of (transformed) input data

the set of real numbers

L

L-dimensional real Euclidean space

ςmax

the maximum eigenvalue

tr(A)

trace of matrix A

T1

abnormality threshold in surprise criterion

T2

redundancy threshold in surprise criterion

u(i)

filter input at time or iteration i (a column vector in an Euclidean space)

input domain

U

input data matrix

ν(i)

additive noise in the output at time or iteration i

ω(i)

weight estimate at time or iteration i (a column vector in a feature space)

z-1

unit delay operator

AIC

Akaike information criterion

ALD

approximate linear dependency

APA

affine projection algorithm

BIC

Bayesian information criterion

CC

coherence criterion

CV

cross-validation

ENC

enhanced novelty criterion

EX-RLS

extended recursive least squares algorithm

EX-KRLS

extended kernel recursive least squares algorithm

GPR

Gaussian process regression

LMS

least-mean-square algorithm

LOOCV

leave-one-out cross-validation

LS

least squares

MAP

maximum a posterior

MDL

minimum description length

MSE

mean square error

MML

maximum marginal likelihood

NC

novelty criterion

NLMS

normalized least-mean-square algorithm

KA

kernel ADALINE

KAPA

kernel affine projection algorithm

KLMS

kernel least-mean-square algorithm

KRLS

kernel recursive least-squares algorithm

PCA

principal components analysis

PDF

probability density function

RAN

resource allocating network

RBF

radial-basis function

RKHS

reproducing kernel Hilbert space

RLS

recursive least-squares algorithm

RN

regularization network

RNN

recurrent neural network

SC

surprise criterion

SNR

signal-to-noise ratio

SVD

singular value decomposition

SVM

support vector machine

SW-KRLS

sliding window kernel recursive least-squares algorithm