139,99 €
This book considers various extensions of the topics treated in the first volume of this series, in relation to the class of models and the type of criterion for optimality. The regressors are supposed to belong to a generic finite dimensional Haar linear space, which substitutes for the classical polynomial case. The estimation pertains to a general linear form of the coefficients of the model, extending the interpolation and extrapolation framework; the errors in the model may be correlated, and the model may be heteroscedastic. Non-linear models, as well as multivariate ones, are briefly discussed. The book focuses to a large extent on criteria for optimality, and an entire chapter presents algorithms leading to optimal designs in multivariate models. Elfving's theory and the theorem of equivalence are presented extensively. The volume presents an account of the theory of the approximation of real valued functions, which makes it self-consistent.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 338
Veröffentlichungsjahr: 2017
Cover
Title
Copyright
Preface
Introduction
1 Approximation of Continuous Functions in Normed Spaces
1.1. Introduction
1.2. Some remarks on the meaning of the word “simple”. Choosing the approximation
1.3. The choice of the norm in order to specify the error
1.4. Optimality with respect to a norm
1.5. Characterizing the optimal solution
2 Chebyshev Systems
2.1. Introduction
2.2. From the classical polynomials to the generalized ones
2.3. Properties of a Chebyshev system
3 Uniform Approximations in a Normed Space
3.1. Introduction
3.2. Characterization of the best uniform approximation in a normed space
4 Calculation of the Best Uniform Approximation in a Chebyshev System
4.1. Some preliminary results
4.2. Functional continuity of the approximation scheme
4.3. Property of the uniform approximation on a finite collection of points in [
a
,
b
]
4.4. Algorithm of de la Vallée Poussin
4.5. Algorithm of Remez
5 Optimal Extrapolation Design for the Chebyshev Regression
5.1. Introduction
5.2. The model and Gauss-Markov estimator
5.3. An expression of the extrapolated value through an orthogonalization procedure
5.4. The Gauss-Markov estimator of the extrapolated value
5.5. The Optimal extrapolation design for the Chebyshev regression
6 Optimal Design for Linear Forms of the Parameters in a Chebyshev Regression
6.1. Outlook and notations
6.2. Matrix of moments
6.3. Estimable forms
6.4. Matrix of moments and Gauss-Markov estimators of a linear form
6.5. Geometric interpretation of estimability: Elfving set
6.6. Elfving theorem
6.7. An intuitive approach to the Elfving theorem
6.8. Extension of Hoel–Levine result: optimal design for a linear c-form
7 Special Topics and Extensions
7.1. Introduction
7.2. The Gauss–Markov theorem in various contexts
7.3. Criterions for optimal designs
7.4. G–optimal interpolation and extrapolation designs for the Chebyshev regression
7.5. Some questions pertaining to the model
7.6. Hypotheses pertaining to the regressor
7.7. A few questions pertaining to the support of the optimal design for extrapolation
7.8. The proofs of some technical results
8 Multivariate Models and Algorithms
8.1. Introduction
8.2. Multivariate models
8.3. Optimality criterions and some optimal designs
8.4. Algorithms
Bibliography
Index
End User License Agreement
Cover
Table of Contents
Begin Reading
C1
iii
iv
v
ix
x
xi
xii
xiii
xiv
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
G1
G2
G3
G4
Series EditorNikolaos Limnios
Giorgio Celant
Michel Broniatowski
First published 2017 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address:
ISTE Ltd27-37 St George’s RoadLondon SW19 4EUUK
www.iste.co.uk
John Wiley & Sons, Inc.111 River StreetHoboken, NJ 07030USA
www.wiley.com
© ISTE Ltd 2017
The rights of Giorgio Celant and Michel Broniatowski to be identified as the authors of this work have been asserted by them in accordance with the Copyright, Designs and Patents Act 1988.
Library of Congress Control Number: 2016962869
British Library Cataloguing-in-Publication Data
A CIP record for this book is available from the British Library
ISBN 978-1-78630-054-6
This second volume, dedicated to optimal designs in interpolation and extrapolation, extends the concepts developed in Volume 1 [CEL 16] to the general framework of regression models defined on arbitrary finite dimensional linear spaces of regressors. This generalization is handled in relation to a variety of aspects. It pertains to the class of regressors, and extends the approach of the first volume to numerous optimality criteria. Special attention is also paid to the relations between these optimality criteria. The object to be estimated is also of a more general nature than an interpolated or an extrapolated value of the response variable under a given operational condition; these quantities are considered as special cases of a linear form of the coefficients of the regression model. Many hypotheses which have been assumed in Volume 1 are weakened. Nonlinear models are considered, as well as heteroscedastic or autocorrelated ones. Some excursion is proposed in cases when the dimension of the regressors is finite but is unknown, and tests for such problems are discussed. Finally, the algorithmic approach to optimal designs is considered, along with a discussion on the corresponding criteria, in the multivariate setting.
This volume is somehow self-consistent, and Volume 1 helps as a benchmark.
The geometric approach to Elfving’s theory follows the one adopted by Pukelsheim; we adopted the arguments arising from the theory of uniform approximation of continuous functions by a Haar system in order to achieve the analysis of the optimal designs for linear forms of the parameters in a Chebyshev regression; this is in the vein of many fundamental contributions in this field, as promoted by Kiefer and Wolfowitz, Karlin and Studden, and others.
We have found it useful to present a number of algorithms; these are stated with greater emphasis on their principles, than on their implementation.
Giorgio CELANTMichel BRONIATOWSKIFebruary 2017
The first four chapters contain material pertaining to the uniform approximation of continuous functions defined on a compact domain, making use of elements in a Haar finite dimensional linear space. The reader therefore has at their disposal the necessary topics to be used in a self contained volume. Also, much of the material presented here is to be used in the third volume, dealing with the infinite dimensional setting. The notation and style follow the approach which was adopted in the first volume.
The first chapter is devoted to the approximation of continuous functions in a normed space. Discussion is held on the advantages or drawbacks resulting from the fact that the norm is either induced by an inner product or not, in terms of existence and uniqueness of the approximation. We also consider natural questions pertaining to the approximating functions, such as rates of convergence, etc., in connection with some closeness considerations between the function to be approximated and the chosen class of approximating functions. Finally, we consider the choice of the norm in relation with requirements on the resulting measurement of the error. At the end of this first chapter, we deal with the special choice of Lp norms, together with some robustness considerations.
The second chapter is of fundamental importance for the topics covered in this volume. In the first volume, optimal interpolation and extrapolation designs were obtained, based on the Borel–Chebyshev theorem for the uniform polynomial approximation of continuous functions defined on a compact set of the real line. The generalization captured here holds on the substitution of the approximating elements by elements in a wider class than the class of polynomials with known degree. Extension of properties of such polynomials to larger classes of functions and the possible limits of such extensions is the core of this chapter, together with appropriate generalization of the Borel–Chebyshev theorem. Generalized polynomials, so-called Chebyshev or Haar systems are defined, and the corresponding properties of the linear space spanned by those systems (Haar spaces) are presented. In those spaces, generic elements share properties of usual polynomials, the Gauss d’Alembert theorem holds, as does an extended version of the Borel–Chebyshev theorem.
The chapter provides definitions for these generalized polynomials, together with a study of their algebraic properties (analysis of their roots, number of roots, etc.), and with a study of their properties with respect to the approximation of functions.
Chapter 3 is devoted to the various theorems that yield the tools for the approximation of functions in a Haar space.
Chapter 4 produces the explicit form of the best generalized polynomial that approximates a given continuous function uniformly. As such, the algorithms are similar to those presented in the first volume, at least in their approach; they generalize the algorithms of de la Vallée Poussin and Remez.
The interested reader will find much additional material in the following books: Rice [RIC 64], Dzyadyk [DZY 08], Achieser [ACH 92], Cheney [CHE 66], Karlin and Studden [KAR 66b], Natanson [NAT 64].
Chapters 5, 6, 7 and 8 are specifically devoted to the study of optimal designs.
Chapter 5 extends the results pertaining to extrapolation and interpolation designs to the context of the Chebyshev regression. The support of the designs is obtained through the Borel–Chebyshev theorem adapted to the generalized polynomials. The measure with such support, which determines the design, is characterized by a theorem due to Kiefer and Wolfowitz; the geometric approach which is adopted here is due to Hoel.
Chapter 6 is an extension of Chapter 5. The extrapolation is seen as a special linear form; in order to determine the optimal design for the estimation of a given linear form, a first analysis of the Gauss–Markov estimator is performed, hence focusing on the two ingredients which yield to specific questions in the realm of designs defined by generalized polynomials: bias and variance arguments. Unbiasedness is expressed in various ways, and yields to the notion of estimable linear forms. This approach induces a study of the moment matrix. The basic results that allow for the definition of the optimal design are Elfving and Karlin–Studden theorems.
Elfving’s theorem characterizes the optimal design for the estimation of a linear form cθ in a model of the form
in terms of the properties of the vector c which defines this form.
This result assesses that there exists some positive constant ρ such that ρ−1c is a convex combination of points of the form ± X(xi), i = 1, …, l where the xi’s are the supporting points of the c-optimal design; l denotes the cardinality of the support, and the weights of this convex combination are the frequencies of the design. Finally Elfving theorem assesses that
1)
ρ
2
is the variance of the least square estimator of
c
θ
when evaluated on the
x
i
’s
2) The
X
(
x
i
)’s are frontier points of the so called Elfving set; the value of
l
results from Carathéodory theorem for convex sets.
Chapter 6 provides an account of the above construction and results. Only in a few cases can the Elfving theorem produce the optimal design in an explicit way. However, when combined with the theorem by Kiefer and Wolfowitz obtained in Chapter 5, together with the extended Borel–Chebyshev theorem and the Karlin and Studden theorem, an operational approach to the optimal design can be handled. The Hoel Levine design, obtained in Volume 1, is deduced as a by-product of Karlin and Studden theorem. The geometric approach to Elfving’s theory, which has been chosen, follows Pukelsheim’s arguments; the discussion of Karlin and Studden’s theorem makes use of the theory of uniform approximation of functions.
The beginning of Chapter 7 considers various optimization criteria for the design. Relations between them are stated in a fundamental result which is known as the Kiefer–Wolfowitz equivalence theorem. Schoenberg’s theorem (which is only partly proved in this volume) allows for a generalization of the optimal design by Guest (see Volume 1) to the general Chebyshev regression.
The chapter also considers the consequences of relaxing a number of hypotheses assumed in Volume 1, such as homoscedasticity or linearity. In the heteroscedastic case, the Elfving theorem is generalized, leading to the theorem of Letz, Dette and Pepelyshev.
Nonlinearity leads to the study of the Fisher information matrix. We also propose a brief excursion toward cases when the polynomials are substituted for analytic functions. Various results in relation to the support of the optimal design for a linear form close this chapter, following various remarks by Kiefer, Wolfowitz and Studden.
The last chapter presents algorithms leading to optimal designs with respect to various criteria. The main topics are related to the problem posed by multi-valued regressors, with a joint range that differs from the Cartesian product of their univariate ranges, a case commonly met in applications. Generally, the simultaneous variation of the factors induces constraints in the regressor range which invalidate various symmetries, which in turn makes the optimal design a challenge. This chapter starts with a short discussion on such issues, and turns to the pioneering works of Fedorov and Wynn, and their extensions. Exchange algorithms are presented as well as a recent algorithm by Pronzato and Zhigljavsky, which obtains an optimal design under a concave criterion.
Given a real valued function f defined on ℝ which is assumed to belong to a class of functions F, approximating f amounts to substitute f by some function φ which belongs to some class V, which we will assume to be included in F. The natural setting is when F is a linear space, which we assume from now. The function φ will be chosen as to be more simple than f. We will call φ the approximation of f, or the approximating function. When f differs from φ, the error
will be considered; the relative error might also be considered, but we will mainly define φ through analysis pertaining to the function R.
In order to make this more meaningful, we have first to define the term “simple” when applied to approximation procedures, and its impact on the behavior of the function R. This requires a precise choice for the class V and for the criterion leading to the choice of φ in V for a given function f.
The choice of the approximating function φ clearly depends on properties to be expected by R or on some function of this error.
A natural setting leads one to define a distance d on F. Therefore, choosing φ follows from the solution of a problem of the kind
This problem may be difficult to solve when f does not belong to V.
A usual choice is to define the distance dF through a norm on F, say ║.║F. The resulting distance satisfies dF (f, g) := ║f − g║F.
The advantages of such a choice are clear when considering the properties of the norm. First, the norm is continuous, when seen as a functional on the linear space F. This setting makes all linear operations on F continuous with respect to the norm, and the topology on F is determined by the neighborhoods of the null function, through translation. This setting provides a set of useful tools; for example, the unit ball in F defined by
is compact if and only if F is a finite dimensional linear space (see Appendix 1, [CEL 16] theorem A1.6), a basic property to be used extensively when proving the existence of optimal points for a linear functional. More generally, all topological properties of a linear functional derive from its behavior on SF (0, 1). We suppose therefore that the function f to be approximated belongs to a normed space (F,║.║F ). We may also require a further assumption, which makes the analytic frame easier; at a certain time, we will require that the norm is defined by an inner product < .,. >F, defining In this case, F is a pre-Hilbert space.
The approximating function φ is assumed to belong to a finite dimensional subspace of F, denoted by V.
The definition of simplicity is somehow vague and shares many different meanings. From the analytic point of view, simplicity is related to analytic regularity. From the standpoint of numerical calculation, the function is seen as a number of elementary operations to be performed; then complexity may be captured through the number of such operations. The function is complex when many operations are required to evaluate it. This latest point of view relates to the means at our disposal in terms of memory size, precision of the basic operations in the computer, speed of the processors, etc. Complexity results from the present state of the means at our disposal.
Choosing a function φ in a linear space V with finite dimension n may lead to defining n itself as the simplicity of the problem. It therefore seems that the simplicity of a given problem is defined by the number of parameters which are requested in order to solve it.
We say that the approximation procedure is linear when the approximating function φ belongs to a finite dimensional linear space.
The relation between the simplicity of the approximating function φ and the error R is loosely stated as follows.
Let φ1, .., φn be n functions defined on ℝ, and consider the absolute value of the local error substituting f by
namely,
This quantity is transformed into a global index, dF (f, φ), which is a function of the dimension of the linear space V where the approximation is performed. We denote this quantity by Ψ (n). When
for some non-negative and non-increasing function g which satisfies limn→∞g(n) = 0, we say that the approximation procedure is of accuracy
Let and
where are the Fourier coefficients of f
We have
and there exists a constant c > 0, such that
The convergence of the series to f holds as a consequence of the following result:
THEOREM 1.1.– (Fejér) When f is some integrable function with summable Fourier coefficients, then the series of functions converges pointwise to f on ℝ.
PROOF.– See Kolmogorov–Fomine [KOL 74]
■
We prove [1.2] and deduce [1.1] for a suitable function g and with Ψ(N) := supx |f (x) − φN (x)|.
Evaluate
by parts, which yields
Iterating, we obtain
Hence, defining
it holds
For a smooth function, convergence of Ψ (N) to 0 is fast; for example, let
Then,
and therefore
Clearly, analytic simplicity and algorithmic complexity are concepts that interact with each other.
Another principle that may lead our ideas when choosing an approximating procedure is the following: the graph of φ should be as close as possible to that of f. Henceforth, when f has “slow variations”, we would require the same for φ, and the same for “rapid variations” of f.
For a given function f with slow variations, its approximation should be a polynomial with a low degree. On the contrary, a function f with large variations would require an approximating polynomial with a high degree. If f has nearly vertical slopes with abrupt changes in curvature or vertices with angular behavior, then its approximation can be chosen as a rational ratio function, i.e. the ratio of two polynomials. In such cases, the advantage of using an approximating function φ in
lies in the fact that a tuning of m and n allows us to reproduce very irregular local behaviors of f. This choice therefore allows us to reduce the complexity inherent to the choice of a polynomial with high degree for the same function f, which furthermore may lead to computational difficulties and numerical instability.
For functions f with rapid increase or decrease, the approximating function should include exponential terms of the form ex, e−x. Accordingly, approximations for periodic functions should include trigonometric terms. More generally, the choice of the function φ would make φ follow the graph of f as simply as possible, sometimes through the choice of a grid, according to the local behavior of f.
Observe that, except for the class all other approximating classes which we considered are linear spaces with finite dimension. For example, the linear space
has dimension m.
The linear space
which is used in order to approximate periodic functions has dimension 2n + 1.
Assuming that f is defined on some interval [a, b] in ℝ, define a partition of [a, b], namely, I1 := [a, z1] , …, Ii+1 := [zi, zi+1] , …, Ik := [zk, b]. In any of these interval Ij, f can be approximated by a simple function, for example, a straight line, a parabola or a cubic function. The resulting approximating function on [a, b] may not be defined in the same way on all the intervals Ij. Therefore, the restrictions φ |Ij of φ on the intervals Ij, j = 1, …, k, are polynomials with degree less or equal to m − 1.
For a given subdivision {z1, …, zk} of [a, b], the class Sm (z1, …, zk) of all functions with continuous derivatives on [a, b] up to the order m − 2 in [a, b], which satisfy that their restrictions are polynomial functions with degree less or equal to m − 1, is the class of polynomial splines. The set of points {z1, …, zk} is the set of nodes for the spline.
When m = 2, the spline functions are continuous and are linear on each Ij. For m = 4, we use the word “cubic spline”. These are functions with first and second continuous derivatives and they coincide with polynomials with degree less or equal to 3 between the nodes.
Denote
PROPOSITION 1.1.– The set of functions Sm (z1, …, zk) defined on [a; b] is a finite dimensional linear space. A basis of Sm (z1, …, zk) is given by the family of functions
where 1 denotes the restrictions of the various functions on [a, b].
PROOF.– For clearness, we denote 1 instead of 1 instead of We prove that the family is a linearly independent family. Let x ∈ [a, b], and assume that
Assume a ≤ x < z1 ≤... ≤ zk. It holds that x − z1< 0, …, x − zk < 0 and therefore (x − zi)+ = 0 for i = 1, …, k. Hence, for x ∈ [a, z1), it holds Since {xj, j = 0, …, r – 1} is a family of independent functions, α0 = … = αr−1 = 0. Now, let x such that z1 ≤ x < z2. Then, x − z2< 0, …, x − zk < 0, and therefore becomes
Hence, β1 = 0. For z2 ≤ x < z3 it holds Hence, β2 = 0. In the same way looking at the various cases x ∈ [zi, zi+1) , i = 3, 4, …, k − 1, we get β2 = … = βk = 0. Therefore, the system is linearly independent. It remains to prove that it generates all the splines. Let s ∈ Sm (z1, …, zk) be some spline. Rename the nodes z1, …, zk, as follows: a = z0, …, zk+1 = b. Consider the polynomials i = 1, …, k + 1, such that s (x) = Pi (x) in [zi−1; zi] for i = 1, …, k + 1. The conditions of differentiability yield
Therefore, there exist coefficients γi such that
Define
with s (x) = P1 (x) for x ∈ [a, z1]. In the interval [z1, z2], we have
Therefore, in [a, z1] ∪ [z1, z2] = [a, z2], we may write
Proceeding this way, we prove that
Therefore,
This proof can be found in [DRO 11].
An important family of norms of functions defined on some interval [a, b] is defined by
The function
is called the weight function; it satisfies
In the case when xω (x) = 1, for all x in [a, b], the above norm is denoted by Lp (f − φ) or Lp instead of Lp (1, f − φ).
As known
Norms of current use for applications are L2, L∞. The norm L2 is called the least square norm, or the Hilbertian norm. The norm L∞ is the uniform norm, or Chebyshev’s norm, or sup norm; some authors name it the Lagrangian norm or minimax norm.
The symbols ║.║2 and ║.║∞ will be used to denote L2 and L∞, respectively.
Although no general rule exists for the choice of a specific norm, some common sense arguments prevail. First, the norm should be such that the problem of approximating some function f has at least one solution. If such a solution can be made explicit, this would be a clear advantage. Such is the case for the L2 norm, which furthermore induces uniqueness of the approximation.
The choice of the norm obviously depends upon the quality of the approximation which we intend to reach and also on the characteristics of the function f. For example, assume that we have only some information on the L2 norm of the function f ; clearly, we will not be able to produce any information on the local error R (x) at some point x. For this sake, we should consider the norm L∞. The resulting information is that in any point x, the error |R (x)| is less or equal L∞ (f − φ).
The following remarks and examples may illustrate the relation which should link the choice of the norm and the properties of the function f.
EXAMPLE 1.1.– Assume that a polynomial with given degree n is chosen for the approximation of the exponential function ex on ℝ. For fixed x, decompose ex as the sum of its integer part and its mantissa in base 2
Therefore, it holds
The integer power of 2, namely, , does not require any approximation, since this is known without error. The only term to be approximated is 2m. Since m ∈ (0; 1), it holds 2m ∈ (1; 2). It follows that it is enough to have at hand a good approximation of the mantissa of the function ex when it takes its values in (1; 2), in order to obtain a good approximation of the function ex on whole ℝ. In other words, accurately evaluating ex on ℝ requires only an accurate approximation of x on (ln 1; ln 2) = (0; 0, 693). Clearly, the norm which guarantees an upper bound ε for the error at any point x in (0; 0.693) is the uniform norm on [0; 0.693]. It follows that the approximating problem can be stated as
EXAMPLE 1.2.– Assume now that we intend to approximate the discontinuous function f (x) := sign (x), for x ∈ [−5; 5], using a function φ in With the uniform norm, there exists an infinity of solutions for this problem
Indeed, it holds
as seen now. We consider different cases:
–
First case
: using the null function
φ
(
x
) = 0 as approximation, we have
–
Second case
: if we use a strictly positive function
φ
(
x
)
>
0 for all
x
, then since for any
x <
0,
sign
(
x
) = −1, it holds
–
Third case
: similar to case 2, if we use a strictly negative function.
–
Fourth case
: for a function
φ
with changing sign, we must consider various subcases.
a) Assume that φ (0) = 0. Then,
Hence,
b) Assume that φ (c) = 0 for some c < 0. Then, there exists a neighborhood on the right of c, say on which x < 0 and φ (x) > 0 hold. Hence,
c) Assume that φ (c) = 0 for some c > 0. Then, there exists a neighborhood on the left of Ic−, on which x > 0 and φ (x) < 0 hold. Hence,
In all those cases, we have
Note that the (continuous) null function 0 : [−5; 5] → ℝ, x →0 (x) = 0, has the same norm L∞ (0) as has φε, defined by
It is, however, clear that φε is a better approximation of f (x) = sign (x) than 0 is. This latest fact can be made explicit through the L2 distance between φε and the null function 0. Indeed, in this case, it holds
Therefore, φε improves on 0 for the L2 distance when approximating the sign function. However, for the sup norm, the null function is as bad as an approximation of the sign function as φε, for example. The choice of the norm should therefore be defined with respect to the properties of the function to be approximated. The sup norm appears to be a bad choice for discontinuous functions. It results that the Hilbertian norm is a better choice than the uniform norm in this case.
We now consider what may be expected in terms of uniqueness and existence for an optimal approximation for a given norm ║.║.
By Weierstrass’ Theorem, continuity of a function and compactness of its domain are a set of sufficient conditions in order to assess minimal and maximal points for this function on its domain.
The norm is a continuous mapping defined on F and therefore also on V. As a result, the existence of a best approximating function φ in V may hold when seen as an optimization problem on a compact subset of V.
Since V ⊂ F is a linear space with finite dimension n the closed ball
with center f and radius ║f║F is a compact set. Indeed SF (f, ║f║F) is clearly a bounded set in V.
Therefore the problem
has indeed at least one solution in
We now consider as potential approximating functions of f, functions φ in V ; we prove that the solutions of
coincide with those of
We then conclude that the problem
has a solution which will at least solve
THEOREM 1.2.– It holds
PROOF.– Assume (otherwise f is the solution of the problem). It holds since
Furthermore 0 ∈V, since V is a linear space. It follows that
For any it holds
Indeed if
then, by the very definition of SF (f, ║f║F) we should have
It follows that 0 approximates f in a better way than any does, by [1.3].
This implies that the optimal approximation of f should belong to
■
We have seen that the solutions to the problem
always exist when V is a finite dimensional linear space.
The problem of uniqueness of such solutions is more complex.
We first state a definition.
DEFINITION 1.1.– Let (F, ║.║F) be a normed linear space. Let
be the frontier of the ball SF (f, r) , r > 0. The norm ║.║F, is called a strict norm if and only if for any couple of functions (g1, g2) ∈ (F r (SF (f, r)))2, (i.e. such that ║f − g1║F = ║f − g2║F) any function g := αg1 + (1 − α) g2, α ∈ (0; 1) on the open segment defined by g1and g2satisfies ║f – g║F < r. In other words, the open segment belongs to the open ball.
DEFINITION 1.2.– A normed linear space is strict whenever its norm is strict.
The following theorem holds.
THEOREM 1.3.– Let (F, ║.║F) be a strict normed linear space and let V be a linear subspace of F with finite dimension n. Given a function f in F, the best approximation φ ∈ V with respect to the norm ║.║F, is unique.
PROOF.– We prove this result by contradiction, assuming that there exist two distinct solutions φ1 and φ2. Define φ3 := (φ1 + φ2) /2. We may assume that f ∈ F \ V.
Let
Then, ║f − φ1║F = ║f − φ2║F = d∗ implicates (φ1, φ2) ∈ (F r (SF (f, d∗)))2. Since any norm is a convex function, it holds
Now ║.║F is strict. Hence, since φ3 is on the segment from φ1 to φ2 and since both φ1 and φ2 belong to (F r (SF (f, d∗)))2, the inequality above is strict. This implies
Henceforth, neither φ1 nor φ2 can be solution of the problem, which concludes the proof.
■
The above theorem has various consequences.
The norms L∞ and L1 are not strict norms. Therefore, uniqueness for problems of the kind
may not hold.
However, the above theorem only provides a sufficient condition for uniqueness.
– We consider the case when
and
The linear space V is defined by
The vector u in has infinitely approximating vectors in V, given by the family of all vectors vx := (x, 0) where −1 ≤ x ≤ 1, for which vx − u∞ = 1 for all x.
– The space is a normed linear space (║.║∞:= L∞). Let
For all f in , the optimal solution exists and is unique, as will be seen in Chapter 2 and it satisfies
– Non-existence of the optimal solution.
We explore the approximation of bounded real valued sequences, a context slightly different from the case of continuous functions. Consider the set of all bounded sequences of real numbers fx, with namely
Elements in l∞ are denoted by (fx)x.
Define
which defines a normed linear space
Let M denote the subset of l∞
This set is convex, being a linear subspace in l∞; indeed when
then for any
Consider the approximation of the element of l∞ which is defined by
the sequence with constant value 1.
Clearly, (1)x ∈ l∞.
The best approximation of (1)x in M should satisfy
Now, we have
Therefore,
Obviously,
which entails
In order to obtain
it is enough to consider, for any positive n in ℕ
which belongs to M (see [1.4]).
We then have
Choosing n arbitrarily yields
The L2 norm is strict; since it is defined through an inner product, one may ask whether all norms defined through an inner product are strict.
The following theorem answers this question.
THEOREM 1.4.– Let < ., >F be an inner product on some linear space F. Then, the norm ║f║F defined by ║f║F :=< f, f >F is a strict norm.
PROOF.– Let f1 and f2 be two distinct elements in F, such that ║f1║F = ║f2║F = 1 assuming that r = 1 in definition 1.1, without loss of generality. By Schwartz inequality, < f1, f2>F ≤ ║f1║F║f2║F. Equality holds if and only if f1 is a multiple of f2. First, consider the case when equality holds. Then,
Since f1 = α f2 and ║f1║F = ║f2║F, it follows that ║f1║F = |α| ║f2║F. Hence, |α| = 1. Now, Therefore, we cannot have α = 1. Hence, f1 = − f2. In this case, Strict convexity then holds. Consider, now the case when , with ║f1║F = ║f2║F = 1 (in the case when the norm is not one, consider Since < f1, f2>F < ║f1║F ║f2║F = 1 (strict inequality follows from the fact that f1 ≠ f2)
which concludes the proof.
■
THEOREM 1.5.– Let (H, < ., >H) be a Hilbert space and M a non-void closed and convex subset in H. Assume further M ⊆ V, where V is a finite dimensional subspace in H. Then, for any f ∈ H, there exists a best approximation φM ∈ M with respect to the norm (defined by the inner product). Further to this, φM satisfies
for any φ in M. Reciprocally if [1.5] holds for all φ in M, then φM is the best approximation of f with respect to the norm.
PROOF.– Existence and uniqueness of φM follow from arguments developed above. We have to prove that the two following statements,
and
are equivalent. Simple calculus proves that
Hence, assuming [1.5], we get
which proves that φM is the best approximation of f in M. Suppose now that φM ∈ M is the best approximation of f in M and argue by contradiction. Suppose that
Consider the function
as α → 0. In a neighborhood of 0, g′ (α) takes negative values. Therefore, g is decreasing on . There exists some α′ ∈ (0; 1) such that the function satisfies
Since M is convex Henceforth, is a better approximation to f than φM, which concludes the proof.
When M is a finite dimensional linear subspace of H, and not any convex subset in H as previously, then the preceding result takes a simpler form, as shown below.
THEOREM 1.6.– Let (H, < ., >H) be a Hilbert space and M be a linear subspace of H with finite dimension. Then, there exists a unique best approximation φM of f in H with respect to the norm. It is characterized by the following property
PROOF.– Direct part. Assume that φM is optimal. We prove that < f −φM, φ >H= 0 holds for all φ ∈ M. Since M is a linear space, it contains some g defined by
where α ∈ and φ ∈ M. It holds
By hypothesis, φM is the optimal solution. Therefore, since g belongs to M, Therefore,
i.e.
for any α ∈ R, and any φ ∈ M. We consider the two following cases:
1) For α, choose the k−th term αk of any positive convergent sequence with limit 0. Therefore,
i.e.
Going to the limit in k yields
hence,
2) The second case is similar with the choice of a negative sequence going to 0. It holds
i.e.
Therefore, we obtain
which yields
Reciprocal. We now turn to the reciprocal statement, namely stating that when
for any φ ∈ M, then φM is the best approximation of f in M. Since M is a linear subspace of H it is convex. From the preceding theorem whenever < f −φM, φ >H = 0 for all φ in M, then φM is the best approximation of f in M with respect to the norm. This concludes the proof.
■
REMARK.– We note that
for any φ ∈ M if and only if (f − φM) ∈ M⊥. Furthermore, since φM ∈ V := span {φ1, …, φn}, there exists a unique set of coefficients such that The relation < f − φM, φ >H= 0, for all φ ∈ M identifies the vector a∗. Indeed, a∗ = G−1f, where
and
Clearly, G (the Gram matrix) is diagonal whenever the basis is orthogonal.
We consider the case when F is not a Hilbert space but merely a normed linear space and when V is a finite dimensional subspace of F.
The following theorem then holds.
THEOREM 1.7.– Let (F, ║.║F ) be a normed linear space and V a subspace of F. Let πV : F → V be a projection of F onto V. Then,
In the above display, |║ πV ║| is the norm of the linear operator, defined by
PROOF.– Obviously, πV (φ) = φ, for any φ ∈ V. Therefore,
Since πV is linear and continuous, there exists some constant
such that
Therefore,
Consider in both sides of the above display, the infimum with respect to all choices of φ ∈ V ; this yields
■
The above theorem proves that up to the constant ((1 + ║|πV║|)), which depends on πV only, the error ║f − πV (f)║F is minimal.
We now assume that V := span {φ1, …, φn} is a linear subspace of (F, L∞), the normed linear space F equipped with the L∞ norm. We assume that dim V = n and denote [a, b] the common domain of all functions in F.
Since {φ1, …, φn} is a basis of V for all In order to emphasize the fact that φ is characterized by its coefficients in the basis {φ1, …, φn}, we denote it by φa.
