139,99 €
Generally, books on mathematical statistics are restricted to the case of independent identically distributed random variables. In this book however, both this case AND the case of dependent variables, i.e. statistics for discrete and continuous time processes, are studied. This second case is very important for today's practitioners. Mathematical Statistics and Stochastic Processes is based on decision theory and asymptotic statistics and contains up-to-date information on the relevant topics of theory of probability, estimation, confidence intervals, non-parametric statistics and robustness, second-order processes in discrete and continuous time and diffusion processes, statistics for discrete and continuous time processes, statistical prediction, and complements in probability. This book is aimed at students studying courses on probability with an emphasis on measure theory and for all practitioners who apply and use statistics and probability on a daily basis.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 308
Veröffentlichungsjahr: 2013
Table of Contents
Preface
PART 1. MATHEMATICAL STATISTICS
Chapter 1. Introduction to Mathematical Statistics
1.1. Generalities
1.2. Examples of statistics problems
Chapter 2. Principles of Decision Theory
2.1. Generalities
2.2. The problem of choosing a decision function11
2.3. Principles of Bayesian statistics
2.4. Complete classes
2.5. Criticism of decision theory – the asymptotic point of view
2.6. Exercises
Chapter 3. Conditional Expectation
3.1. Definition
3.2. Properties and extension
3.3. Conditional probabilities and conditional distributions
3.4. Exercises
Chapter 4. Statistics and Sufficiency
4.1. Samples and empirical distributions
4.2. Sufficiency
4.3. Examples of sufficient statistics – an exponential model
4.4. Use of a sufficient statistic
4.5. Exercises
Chapter 5. Point Estimation
5.1. Generalities
5.2. Sufficiency and completeness
5.3. The maximum-likelihood method
5.4. Optimal unbiased estimators
5.5. Efficiency of an estimator
5.6. The linear regression model
5.7. Exercises
Chapter 6. Hypothesis Testing and Confidence Regions
6.1. Generalities
6.2. The Neyman–Pearson (NP) lemma
6.3. Multiple hypothesis tests (general methods)
6.4. Case where the ratio of the likelihoods is monotonic
6.5. Tests relating to the normal distribution
6.6. Application to estimation: confidence regions
6.7. Exercises
Chapter 7. Asymptotic Statistics
7.1. Generalities
7.2. Consistency of the maximum likelihood estimator
7.3. The limiting distribution of the maximum likelihood estimator
7.4. The likelihood ratio test
7.5. Exercises
Chapter 8. Non-Parametric Methods and Robustness
8.1. Generalities
8.2. Non-parametric estimation
8.3. Non-parametric tests
8.4. Robustness
8.5. Exercises
PART 2. STATISTICS FOR STOCHASTIC PROCESSES
Chapter 9. Introduction to Statistics for Stochastic Processes
9.1. Modeling a family of observations
9.2. Processes
9.3. Statistics for stochastic processes
9.4. Exercises
Chapter 10. Weakly Stationary Discrete-Time Processes
10.1. Autocovariance and spectral density
10.2. Linear prediction and Wold decomposition
10.3. Linear processes and the ARMA model
10.4. Estimating the mean of a weakly stationary process
10.5. Estimating the autocovariance
10.6. Estimating the spectral density
10.7. Exercises
Chapter 11. Poisson Processes – A Probabilistic and Statistical Study
11.1. Introduction
11.2. The axioms of Poisson processes
11.3. Interarrival time
11.4. Properties of the Poisson process
11.5. Notions on generalized Poisson processes
11.6. Statistics of Poisson processes
11.7. Exercises
Chapter 12. Square-Integrable Continuous-Time Processes
12.1. Definitions
12.2. Mean-square continuity
12.3. Mean-square integration
12.4. Mean-square differentiation
12.5. The Karhunen–Loeve theorem
12.6. Wiener processes
12.7. Notions on weakly stationary continuous-time processes
12.8. Exercises
Chapter 13. Stochastic Integration and Diffusion Processes
13.1. Itô integral
13.2. Diffusion processes
13.3. Processes defined by stochastic differential equations and stochastic integrals
13.4. Notions on statistics for diffusion processes
13.5. Exercises
Chapter 14. ARMA Processes
14.1. Autoregressive processes
14.2. Moving average processes
14.3. General ARMA processes
14.4. Non-stationary models
14.5. Statistics of ARMA processes
14.6. Multidimensional processes
14.7. Exercises
Chapter 15. Prediction
15.1. Generalities
15.2. Empirical methods of prediction
15.3. Prediction in the ARIMA model
15.4. Prediction in continuous time
15.5. Exercises
PART 3. SUPPLEMENT
Chapter 16. Elements of Probability Theory
16.1. Measure spaces: probability spaces
16.2. Measurable functions: real random variables
16.3. Integrating real random variables
16.4. Random vectors
16.5. Independence
16.6. Gaussian vectors
16.7. Stochastic convergence
16.8. Limit theorems
Appendix. Statistical Tables
A1.1. Random numbers
A1.2. Distribution function of the standard normal distribution
A1.3. Density of the standard normal distribution
A1.4. Percentiles (tp) of Student’s distribution
A1.5. Ninety-fifth percentiles of Fisher–Snedecor distributions
A1.6. Ninety-ninth percentiles of Fisher–Snedecor distributions
A1.7. Percentiles of the χ2 distribution with n degrees of freedom
A1.8. Individual probabilities of the Poisson distribution
A1.9. Cumulative probabilities of the Poisson distribution
A1.10. Binomial coefficients for n ≤ 30 and 0 ≤ k ≤ 7
A1.11. Binomial coefficients for n ≤ 30 and 8 ≤ k ≤ 15
Bibliography
Index
First published 2012 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address:
ISTE Ltd
John Wiley & Sons, Inc.
27-37 St George’s Road
111 River Street
London SW19 4EU
Hoboken, NJ 07030
UK
USA
www.iste.co.uk
www.wiley.com
© ISTE Ltd 2012
The rights of Denis Bosq to be identified as the author of this work have been asserted by him in accordance with the Copyright, Designs and Patents Act 1988.
Library of Congress Cataloging-in-Publication Data
Bosq, Denis, 1939-
Mathematical statistics and stochastic processes / Denis Bosq.
p. cm.
Includes bibliographical references and index.
ISBN 978-1-84821-361-6
1. Mathematical statistics. 2. Stochastic processes. I. Title.
QA276.8.B672 2012
519.5--dc23
2012006812
British Library Cataloguing-in-Publication Data
A CIP record for this book is available from the British Library
ISBN: 978-1-84821-361-6
Preface
This book is dedicated to the mathematical modeling of statistical phenomena. It attempts to fill a gap: most textbooks on mathematical statistics only treat the case of sampling, and yet, in applications, the observed variables are very often correlated – examples are numerous in physics, chemistry, biology, demography, economics and finance. It is for this reason that we have given an important place to process statistics.
This book is divided into three parts: Mathematical Statistics, Statistics for Stochastic Processes, and a supplement on probability. The first part begins with decision theory, to develop the classical theory of estimation and tests. It then adopts an asymptotic viewpoint, in particular, in a non-parametric framework. In the second part, we first study the statistics of discrete-time stationary processes, and then the statistics of Poisson processes. The third part is dedicated to continuous-time processes: we define the Itô integral and give some results for the statistics of second-order process and for diffusion. Statistical prediction is addressed in the last chapter. Finally, the principles of probability are described, based on measure theory.
The mathematical level is that of a master’s degree in applied mathematics: I have based the material on some lectures given at UPMC (Paris 6) in France, particularly [BOS 78] and [BOS 93].
A certain number of exercises, some original, illustrate the contents of this textbook. I would like to thank the teachers who contributed in putting them together, notably Gérard Biau, Delphine Blanke, Jérôme Dedecker, and Florence Merlevède.
I would also like to thank François-Xavier Lejeune for his excellent write-up of the manuscript, and for his contribution to the development of this book.
Finally, I would like to thank Maximilian Lock for all his work on this book.
We may define statistics as the set of methods that allow, from the observation of a random phenomenon, the obtainment of information about the probability associated with this phenomenon.
It is important to note that the random character that we attribute to the considered phenomenon is often merely a way to translate our ignorance of the laws that govern it. Also, a preliminary study, taking into account only the observations made, proves interesting; this is the aim of data analysis.
Data analysis is the set of techniques of statistical description whose purpose is the treatment of data without any probabilistic hypotheses. It aims, in particular, to reveal the dominant parameters among those upon which the observation depends.
Descriptive statistics also treats observations without formulating any prior hypotheses, but we may consider these hypotheses to be underlying, since it essentially consists of studying the empirical probabilistic characteristics of observations (histograms, empirical moments, etc.).
However, we may include data analysis in descriptive statistics by defining it as the set of methods that allow the ordering of data and their presentation in a usable form.
Furthermore, simulation is an additional technique often used in statistics: it consists of carrying out fictitious experiments in such a way as to make visible the expressions of chance in the evolution of a phenomenon. A simple, important simulation problem is that of the generation of random numbers, that is, the generation of a sequence of numbers x1, …, xn, which may be considered as the realizations of random variables X1, …, Xn, the distribution of (X1, …, Xn) being given.
The techniques indicated above fall essentially within applied statistics and we will only discuss them sporadically in this book. We will, in general, focus on the framework of mathematical statistics, that is, theoretical statistics based on measure theory and, in part, on decision theory.
We will first give some examples of statistics problems, emphasizing the construction of the associated probability model.
A manufacturer receives a batch of objects containing an unknown proportion of defective objects. Supposing the number of objects in the batch to be large, an audit may be carried out only by taking a sample of objects from the batch. Given the number of defective objects in the sample, the manufacturer will accept or reject the batch. We may associate several probability models with this problem:
2) If n and n1 are large relative to r, we may use the binomial approximation and suppose that X follows the distribution . This comes from the fact that when n → ∞, n1/n → p > 0,
when r → ∞, rp → λ > 0.
Since n1 is unknown, so too are the parameters of the above distributions. We must therefore consider the triplet where designates the set of hypergeometric distributions with parameters (n, n1, r), where n and r are fixed, and .
The choice of a decision criterion is then based on the fact that we may commit two types of error: accept a bad batch or reject a good one. We therefore aim to minimize these measurement errors as much as possible.
A physicist measures a real value a certain number of times. The values found are not exact due to measurement errors. The problem is then which value to accept for the measured quantity.
To construct the associated probability model, we generally make the following hypothesis: the measurement errors have extremely varied causes (lack of precision or unreliable instruments, misreading by the experimenter, etc.); we may admit as a first approximation that these “causes” are independent of each other: the error is therefore the sum of a large number of small, independent errors.
The central limit theorem allows us to assert that this error follows (approximately) a normal distribution. Moreover, we may often, for reasons of symmetry, suppose that the measurements carried out have the same expectation (mean) value as the quantity considered.
We may therefore associate n independent observations of this quantity with the triplet where I and J are intervals of and , respectively. We must therefore determine m in as precise a way as possible: this is a problem of estimation.
An economist observes the evolution of the price of a certain product in the time interval [t1, t2]; he seeks to predict the price of this product at the time t3 (>t2).
This random phenomenon may be modeled in the following way: we consider a family (ξt, t ≥ t1) of real random variables where ξt represents the price of the product at time t. It is therefore a question of predicting, in light of the realizations of the random variables ξt, t1 ≤ t ≤ t2, the best possible value of ξt3.
If the distributions of the random variables ξt, or the correlations that exist between them, are not fully known, this problem of prediction falls within the domain of statistics.
The problem of interpolation is of an analogous nature: it concerns the determination of the best possible ξt0 given ξt, where t ∈ [t1, t2] ∪ [t3, t4] with t2 < t0 < t3.
Prediction and interpolation are two particular cases of general filtering problems, or in other words problems of the estimation of an unobserved random variable Y from an observed random variable X1.
To determine the precision of this estimate, we may evaluate
and we then say that with a confidence of 1 − α, Nn/n is an estimator of p to within ε, or equally that p belongs to the confidence interval with a confidence level 1 − α.
For the calculation of α, we may, when n is large, use the normal approximation of for which we write
an approximation valid for p ∈]0, 1[.
A doctor wants to test a medication; for this, he chooses a first group of patients to whom the medication is administered, and a second group consists of patients who receive a placebo.
Let Xi be a random variable associated with the ith patient of the first group, which conveys the obtained result: cure, improvement, aggravation, stationary state, etc. We define the variable Yi associated with the jth patient of the second group in a similar way.
1 A general solution to filtering problems in the case where the distribution of the pair (X, Y) is known is given in Chapter 3.
The examples given in Chapter 1 show that a statistical problem may be characterized by the following elements:
– a probability distribution that is not entirely known;
– observations;
– a decision to be made.
Formalization: Wald’s decision theory provided a common framework for statistics problems. We take:
2) A measurable space is called a decision (or action) space.
3) A set of measurable mappings is called a set of decision functions (d.f.) (or decision rules).
Description: From an observation that follows an unknown distribution P ∈ P, a statistician chooses an element a ∈ D using an element d from .
Preference relation: To guide his/her choice, the statistician takes a preorder (i.e. a binary relation that is reflexive and transitive) on . One such preorder is called a preference relation. We will write it as so that reads “d1 is preferable to d2”. We say that d1 is strictly preferable to d2 if and .
The statistician is therefore concerned with the choice of a “good” decision as defined by the preference relation considered.
Risk function: One convenient way of defining a preference relation on is the following:
1) Θ being provided with a σ-algebra , we take a measurable map L from to called a loss function.
2) We set:
where R is called the risk function associated with L; it is often written in the following form:
where X denotes a random variable with distribution Pθ1 and Eθ the expected value associated with Pθ. We note that R takes values in .
3) We say that d1 is preferable to d2 if:
[2.1]
We will say that d1 is strictly preferable to d2 if [2.1] holds, and if there exists θ ∈ Θ such that:
INTERPRETATION 2.1.– L(θ, a) represents the loss incurred by the decision a when the probability of X is Pθ (or when the parameter is equal to θ). R(θ, d) then represents the average loss associated with the decision function d. The best possible choice in light of the preference relation defined by [2.1] becomes the decision that minimizes the average loss, whatever the value of the parameter.
EXAMPLE 2.1.–
1) In the example of quality control (chapter 1), the decision space contains two elements: a1 (accept the batch) and a2 (reject the batch). A decision rule is therefore a map from {1, …, r} to {a1, a2}.
The choice of a loss function may be carried out by posing:
where c1 and c2 are two given positive numbers.
2) In the measurement errors example, , and a decision function is a numerical measurable function defined on .
A common loss function is defined by the formula:
from which we have the quadratic risk function:
COMMENT 2.1.– In example 2.1(1), among others, we may very well envisage other sets of decisions. For example:
A detailed study of the sequential and non-deterministic methods is beyond the scope of this book. In what follows, we will only treat certain particular cases.
DEFINITION 2.1.– A decision function is described as optimal inif it is preferable to every other element of.
If is too large or if the preference relation is too narrow, there is generally no optimal decision function, as the following lemma shows:
LEMMA 2.1.– If:
1)P contains two non-orthogonal2probabilities Pθ1and Pθ2,
3)contains constant functions andis defined using the loss function L such that:
then there is no optimal decision rule in.
PROOF.– Let us consider the decision rules:
which obey
An optimal rule d will therefore have to verify the relations:
hence
therefore
which is a contradiction, since Pθ1 and Pθ2 are not orthogonal.
EXAMPLE 2.2.– This result is applied to example 2.1 (2).
1) To reduce , we may demand that it possesses properties of regularity for the envisaged decision functions. Among these properties, two are frequently used:
DEFINITION 2.2.– A decision function d is called unbiased with respect to the loss function L if:
EXAMPLE 2.3.– In example 2.2 is an unbiased decision rule.
If a decision problem is invariant or symmetric with respect to certain operations, we may demand that the same is true for the decision functions.
We will study some examples later.
2) We may, on the other hand, seek to replace optimality with a less strict condition:
i) by limiting ourselves to the search for admissible decision functions (a d.f. is said to be admissible if no d.f. may be strictly preferred to it);
ii) by substituting with a less narrow preference relation for which an optimal d.f. exists.
Bayesian methods, which we study in the following section, include these two principles.
The idea is as follows: we suppose that Θ is provided with a σ-algebra and we take a probability τ (called an“a priori probability” on ).
R being a risk function such that is measurable for every measurable mapping , we set:
where r is called the Bayesian risk function associated with R; it leads to the following preference relation:
An optimal d.f. for is said to be Bayesian with respect to τ.
DISCUSSION.– Bayesian methods are often criticized, as the choice of τ is fairly arbitrary. However, Bayesian decision functions possess a certain number of interesting properties; in particular, they are admissible, whereas this is not always the case for an unbiased d.f.
We suppose that Pθ has a density f (·, θ) with respect to some σ-finite measure λ on . Furthermore, f (·, ·) is assumed to be -measurable.
Under these conditions,
where t(·, θ) is defined Pθ almost everywhere by the formula:
A d.f. that minimizes ∫ L(θ, d(x))t(x, θ)dτ(θ) is therefore Bayesian. This quantity is called the a posteriori risk (x being observed).
In a certain sense, using the Bayesian method is equivalent to transforming the problem of the estimation of θ into a filtering problem: we seek to estimate the unobserved random variable θ from the observed random variable X.
EXAMPLE 2.4.–
1) . With respect to the Lebesgue measure on has the density
with
and
hence the marginal density of X
therefore
We deduce Bayes’ rule
COMMENT 2.2.– The fact that τ is a uniform distribution on ]0, 1[ does not mean that we have a priori no opinion on θ. In effect, such a choice implies, for example, that θ2 follows the density distribution ; this shows that we are not without opinion on θ2, and therefore on θ.
THEOREM 2.1.–
1)If d0is, a.s. for all θ, the only Bayes rule associated with τ, then d0is admissible for R.
2)If, R(θ, d) is continuous onθfor all, τsupports all open, and r(τ, d0) < ∞ (whered0is a Bayes rule associated with τ), then d0is admissible for R.
PROOF.–
1) Let d be a d.f. preferable to d0, we then have:
and, since d0 is Bayesian, we have:
The uniqueness of d0 then implies that:
From this, we deduce:
therefore, d is not strictly preferable to d0.
2) If d0 was not admissible, there would exist such that:
By continuity, we deduce that there would exist an open neighborhood U of θ0 and a number ε > 0 such that:
Under these conditions:
which is a contradiction, since d0 is Bayesian.
DEFINITION 2.3.– A classof decision functions is said to be complete (or essentially complete) if for allthere existsthat is strictly preferable (or preferable, respectively) to d.
The solution to a decision problem must therefore be sought in a complete class, or at least an essentially complete class.
DEFINITION 2.4.– Let μ be a measure on. A decision function d0is said to be a generalized Bayesian decision function with respect to μ if:
EXAMPLE 2.5.– In example 2.4(1) in the previous section, is a generalized Bayesian decision function with respect to the Lebesgue measure.
We note that ; we say that is a Bayes limit. Under fairly general conditions relative to L, we may show that the Bayes limit decision functions are generalized Bayesian functions with respect to a certain measure μ (see, for example, [SAC 63]).
We now state a general Wald theorem, whose demonstration appears in [WAL 50]:
THEOREM 2.2.– Ifand if:
1);
2):
3):
then
As we have seen, decision theory provides a fairly convenient framework for the description of statistics problems. We may, however, make several criticisms of it.
First of all, this framework is often too general for the results of the theory to be directly usable in a precise, particular case. This is true for testing problems.
On the other hand, we may only obtain sufficient information on a decision to be made if we make use of a large number of observations of the studied random phenomenon. It is therefore interesting to pose a statistics problem in the following way: with the number of observations made, n, we associate a d.f. dn and study the asymptotic behavior of the sequence (dn) when n tends to infinity – particularly its convergence toward the “true” decision, and the speed of its convergence. This “asymptotic theory” is rich in results and applications.
For a more in-depth study of decision theory and Bayesian methods, we refer to [LEH 98] and [FOU 02].
EXERCISE 2.1.– Let X be a random variable that follows the Bernoulli distribution where 0 ≤ θ ≤ 1. To estimate θ in light of X, we chose a priori a density distribution .
1) Determine the Bayes estimator of θ.
EXERCISE 2.2.– Let X be a random variable that follows a uniform distribution on ]0, θ[ (θ > 0). To construct a Bayesian estimator, we choose a priori the density distribution and choose the quadratic error as a risk function.
1) Determine the conditional density of θ knowing X. From this, deduce the Bayesian estimator of X.
2) Calculate the quadratic error of this estimator and compare it to the quadratic errors of X and the unbiased estimator 2X. Comment on the result.
EXERCISE 2.3.– Taking the previous exercise, and choosing as a risk function the error L1:
2) Determine the Bayesian estimator of X.
Show that:
[2.2]
(A d.f. obeying [2.2] is called minimax: it minimizes the maximum risk on Θ.)
EXERCISE 2.5.– Let μ be a σ-finite measure on such that:
We consider the real random variable X whose density with respect to μ is written as:
2) We put a priori the distribution onto . The risk function being the quadratic error, determine the Bayesian estimator of g(θ) associated with τσ, i.e. .
3) Show that:
4) From this, deduce that is an admissible estimator.
EXERCISE 2.6.– Let X1, …, Xn be a sample of the Bernoulli distribution , θ ∈]0, 1[. It is proposed that a minimax estimator T of θ be constructed for the quadratic error. In other words, T minimizes
1) We set . Calculate .
2) Considering the estimator:
calculate M(T) and compare it to .
3) Show that T is a Bayesian estimator with respect to the a priori density distribution:
We give the formula:
4) Establish the following result: “A Bayesian estimator whose quadratic risk does not depend on θ is minimax”. Thus, deduce that T is minimax.
We want to define a Bayesian estimator of θ. For this, we take as an a priori distribution the Gamma distribution Γ(α, β) with density:
1 To avoid the introduction of an additional space, we may assume that X is the identity of E. This will henceforth be the case, unless otherwise indicated.
3seeChapter 3, page 21.
Let (X, Y) be a pair of random variables defined on the probability space in which only X is observed. We wish to know what information X carries about Y: this is the filtering problem defined in Chapter 1.
This problem may be formalized in the following way: supposing Y to be real and square integrable, construct a real random variable of the form r(X) that gives the best possible approximation of Y with respect to the quadratic error, i.e. E[(Y − r(X))2] being minimal.
If we identify the random variables with their P-equivalence classes, we deduce that r(X) exists and is unique, since it is the orthogonal projection (in the Hilbert space L2 (P)) of Y on the closed vector subspace L2 (P) constituted by the real random variables of the form h(X) and such that E[(h(X))2] < +∞.
From Doob’s lemma, the real random variables of the form h(X) are those that are measurable with respect to the σ-algebra generated by X. We say that r(X) is the conditional expectation of Y with respect to the σ-algebra generated by X (or with respect to X), and that r is the regression of Y on X. We write:
The above equation leads us to the following definition.
DEFINITION 3.1.– Letbe a probability space and letbe a sub-σ-algebra of. We call the orthogonal projection of L2onto L2the conditional expectation with respect to, denoted byor E ( ·|).
CHARACTERIZATION: Following from the definition of an orthogonal projection, is characterized by:
1)
2)
We may replace (2) by
2′)
which is easily seen using the linearity and the monotone continuity of the integral.
1) is a contracting and idempotent linear map of L2 onto L2. Moreover, it is positive and it conserves constants.
The first three properties (contraction (i.e. ), idempotence (i.e. ), and linearity) are characteristics of orthogonal projections.
Its positivity (i.e. ) is established by noting that, for Y ≥ 0,
which implies that a.s.
Finally, it is clear that .
COMMENT 3.1.– We may show that the above five properties characterize the operators of L2 () that are conditional expectations.
2) -measurable and bounded.
In effect, , and
therefore is indeed the orthogonal projection of UY onto L2.
3) . The linearity and positivity of affirms that lim exists. Yet
and since |Yn| ≤ |Y1| + |Y| and , by twice applying the dominated convergence theorem, we obtain:
Since lim is in L2, we have lim .
4) If Y−1andare independent, . In effect:
5) If and are two sub-σ-algebras such that , then .
This is a known property of orthogonal projections.
6) Extension: We will now define when Y is only positive or integrable.
For Y positive, we note that there exists a sequence (Yn) of positive bounded (and therefore square integrable) real random variables such that Yn ↑ Y. We then set . It is straightforward to see that is unique, and that it is characterized by:
(2 bis) may be replaced by:
Among the properties of , we may cite the following:
For positive Y, and positive and -measurable U, we have:
Now, for Y ∈ L1, we note that and are integrable, and we set:
Again, we have uniqueness, and the characterizations (1)–(2) and (1)–(2′ bis), where it is necessary to replace L2 and L2 with L1 and L1, respectively. Furthermore, properties (1)–(5) are still valid, with slight modifications. In particular, we have the following important property:
