Multivariate Nonparametric Regression and Visualization - Jussi Sakari Klemelä - E-Book

Multivariate Nonparametric Regression and Visualization E-Book

Jussi Sakari Klemelä

0,0
104,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

A modern approach to statistical learning and its applications through visualization methods

With a unique and innovative presentation, Multivariate Nonparametric Regression and Visualization provides readers with the core statistical concepts to obtain complete and accurate predictions when given a set of data. Focusing on nonparametric methods to adapt to the multiple types of data generating mechanisms, the book begins with an overview of classification and regression.

The book then introduces and examines various tested and proven visualization techniques for learning samples and functions. Multivariate Nonparametric Regression and Visualization identifies risk management, portfolio selection, and option pricing as the main areas in which statistical methods may be implemented in quantitative finance. The book provides coverage of key statistical areas including linear methods, kernel methods, additive models and trees, boosting, support vector machines, and nearest neighbor methods. Exploring the additional applications of nonparametric and semiparametric methods, Multivariate Nonparametric Regression and Visualization features:

  • An extensive appendix with R-package training material to encourage duplication and modification of the presented computations and research
  • Multiple examples to demonstrate the applications in the field of finance
  • Sections with formal definitions of the various applied methods for readers to utilize throughout the book

Multivariate Nonparametric Regression and Visualization is an ideal textbook for upper-undergraduate and graduate-level courses on nonparametric function estimation, advanced topics in statistics, and quantitative finance. The book is also an excellent reference for practitioners who apply statistical methods in quantitative finance.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 571

Veröffentlichungsjahr: 2014

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Contents

Cover

Half Title page

Title page

Copyright page

Dedication

Preface

Introduction

I.1 Estimation of Functionals of Conditional Distributions

I.2 Quantitative Finance

I.3 Visualization

I.4 Literature

Part I: Methods of Regression and Classification

Chapter 1: Overview of Regression and Classification

1.1 Regression

1.2 Discrete Response Variable

1.3 Parametric Family Regression

1.4 Classification

1.5 Applications in Quantitative Finance

1.6 Data Examples

1.7 Data Transformations

1.8 Central Limit Theorems

1.9 Measuring the Performance of Estimators

1.10 Confidence Sets

1.11 Testing

Chapter 2: Linear Methods and Extensions

2.1 Linear Regression

2.2 Varying Coefficient Linear Regression

2.3 Generalized Linear and Related Models

2.4 Series Estimators

2.5 Conditional Variance and ARCH Models

2.6 Applications in Volatility and Quantile Estimation

2.7 Linear Classifiers

Chapter 3: Kernel Methods and Extensions

3.1 Regressogram

3.2 Kernel Estimator

3.3 Nearest-Neighbor Estimator

3.4 Classification with Local Averaging

3.5 Median Smoothing

3.6 Conditional Density Estimation

3.7 Conditional Distribution Function Estimation

3.8 Conditional Quantile Estimation

3.9 Conditional Variance Estimation

3.10 Conditional Covariance Estimation

3.11 Applications in Risk Management

3.12 Applications in Portfolio Selection

Chapter 4: Semiparametric and Structural Models

4.1 Single-Index Model

4.2 Additive Model

4.3 Other Semiparametric Models

Chapter 5: Empirical Risk Minimization

5.1 Empirical Risk

5.3 Support Vector Machines

5.4 Stagewise Methods

5.5 Adaptive Regressograms

Part II: Visualization

Chapter 6: Visualization of Data

6.1 Scatter Plots

6.2 Histogram and Kernel Density Estimator

6.3 Dimension Reduction

6.4 Observations as Objects

Chapter 7: Visualization of Functions

7.1 Slices

7.2 Partial Dependence Functions

7.3 Reconstruction of Sets

7.4 Level Set Trees

7.5 Unimodal Densities

Appendix A: R Tutorial

A.1 Data Visualization

A.2 Linear Regression

A.3 Kernel Regression

A.4 Local Linear Regression

A.5 Additive Models: Backfitting

A.6 Single-Index Regression

A.7 Forward Stagewise Modeling

A.8 Quantile Regression

References

Author Index

Topic Index

Multivariate Nonparametric Regression and Visualization

WILEY SERIES IN COMPUTATIONAL STATISTICS

Consulting Editors:

Paolo GiudiciUniversity of Pavia, Italy

Geof H. GivensColorado State University, USA

Bani K. MallickTexas A&M University, USA

 

Wiley Series in Computational Statistics is comprised of practical guides and cutting edge research books on new developments in computational statistics. It features quality authors with a strong applications focus. The texts in the series provide detailed coverage of statistical concepts, methods and case studies in areas at the interface of statistics, computing, and numerics.

With sound motivation and a wealth of practical examples, the books show in concrete terms how to select and to use appropriate ranges of statistical computing techniques in particular fields of study. Readers are assumed to have a basic understanding of introductory terminology.

The series concentrates on applications of computational methods in statistics to fields of bioinformatics, genomics, epidemiology, business, engineering, finance and applied statistics.

Billard and Diday · Symbolic Data Analysis: Conceptual Statistics and Data Mining

Bolstad · Understanding Computational Bayesian Statistics

Dunne · A Statistical Approach to Neural Networks for Pattern Recognition

Ntzoufras · Bayesian Modeling Using WinBUGS

Klemela · Multivariate Nonparametric Regression and Visualization: With R and Applications to Finance

Copyright © 2014 by John Wiley & Sons, Inc. All rights reserved.

Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada.

No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representation or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print, however, may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.

Library of Congress Cataloging-in-Publication Data:

Klemelä, Jussi, 1965–   Multivariate nonparametric regression and visualization : with R and applications to finance / Jussi Klemelä.     pages cm. — (Wiley series in computational statistics ; 699)   Includes bibliographical references and index. ISBN 978-0-470-38442-8 (hardback) 1. Finance—Mathematical models. 2. Visualization. 3. Regression analysis. I. Title. HG176.5.K55 2014 519.5’36—dc23

2013042095

To my parents

PREFACE

The book is intended for students and researchers who want to learn to apply nonparametric and semiparametric methods and to use visualization tools related to these estimation methods. In particular, the book is intended for students and researchers in quantitative finance who want to apply statistical methods and for students and researchers of statistics who want to learn to apply statistical methods in quantitative finance. The book continues the themes of Klemelä (2009), which studied density estimation. The current book focuses on regression function estimation.

The book was written at the University of Oulu, Department of Mathematical Sciences. I wish to acknowledge the support provided by the University of Oulu and the Department of Mathematical Sciences.

The web page of the book is http://cc.oulu.fi/~jklemela/regstruct/.

JUSSI KLEMELÄ

Oulu, FinlandOctober 2013

INTRODUCTION

We study regression analysis and classification, as well as estimation of conditional variances, quantiles, densities, and distribution functions. The focus of the book is on nonparametric methods. Nonparametric methods are flexible and able to adapt to various kinds of data, but they can suffer from the curse of dimensionality and from the lack of interpretability. Semiparametric methods are often able to cope with quite high-dimensional data and they are often easier to interpret, but they are less flexible and their use may lead to modeling errors. In addition to terms “nonparametric estimator” and “semiparametric estimator”, we can use the term “structured estimator” to denote such estimators that arise, for example, in additive models. These estimators obey a structural restriction, whereas the term “semiparametric estimator” is used for estimators that have a parametric and a nonparametric component.

Nonparametric, semiparametric, and structured methods are well established and widely applied. There are, nevertheless, areas where a further work is useful. We have included three such areas in this book:

1. Estimation of several functionals of a conditional distribution; not only estimation of the conditional expectation but also estimation of the conditional variance and conditional quantiles.
2. Quantitative finance as an area of application for nonparametric and semiparametric methods.
3. Visualization tools in statistical learning.

I.1 ESTIMATION OF FUNCTIONALS OF CONDITIONAL DISTRIBUTIONS

One of the main topics of the book are the kernel methods. Kernel methods are easy to implement and computationally feasible, and their definition is intuitive. For example, a kernel regression estimator is a local average of the values of the response variable. Local averaging is a general regression method. In addition to the kernel estimator, examples of local averaging include the nearest-neighbor estimator, the regressogram, and the orthogonal series estimator.

We cover linear regression and generalized linear models. These models can be seen as starting points to many semiparametric and structured regression models. For example, the single index model, the additive model, and the varying coefficient linear regression model can be seen as generalizations of the linear regression model or the generalized linear model.

Empirical risk minimization is a general approach to statistical estimation. The methods of empirical risk minimization can be used in regression function estimation, in classification, in quantile regression, and in the estimation of other functionals of the conditional distribution. The method of local empirical risk minimization is a method which can be seen as a generalization of the kernel regression.

A regular regressogram is a special case of local averaging, but the empirical choice of the partition leads to a rich class of estimators. The choice of the partition is made using empirical risk minimization. In the one- and two-dimensional cases a regressogram is usually less efficient than the kernel estimator, but in high-dimensional cases a regressogram can be useful. For example, a method to select the partition of a regressogram can be seen as a method of variable selection, if the chosen partition is such that it can be defined using only a subset of the variables. The estimators that are defined as a solution of an optimization problem, like the minimizers of an empirical risk, need typically be calculated with numerical methods. Stagewise algorithms can also be taken as a definition of an estimator, even without giving an explicit minimization problem which they solve.

A regression function is defined as the conditional expectation of the distribution of a response variable. The conditional expectation is useful in making predictions as well as in finding causal relationships. We cover also the estimation of the conditional variance and conditional quantiles. These are needed to give a more complete view of the conditional distribution. Also, the estimation of the conditional variance and conditional quantiles is needed in risk management, which is an important area of quantitative finance. The conditional variance can be estimated by estimating the conditional expectation of the squared random variable, whereas a conditional quantile is a special case of the conditional median. In the time series setting the standard approaches for estimating the conditional variance are the ARCH and GARCH modeling, but we discuss nonparametric alternatives. The GARCH estimator is close to a moving average, whereas the ARCH estimator is related to linear state space modeling.

In classification we are not interested in the estimation of functionals of a distribution, but the aim is to construct classification rules. However, most of the regression function estimation methods have a counterpart in classification.

I.2 QUANTITATIVE FINANCE

Risk management, portfolio selection, and option pricing can be identified as three important areas of quantitative finance. Parametric statistical methods have been dominating the statistical research in quantitative finance. In risk management, probability distributions have been modeled with the Pareto distribution or with distributions derived from the extreme value theory. In portfolio selection the multivariate normal model has been used together with the Markowitz theory of portfolio selection. In option pricing the Black-Scholes model of stock prices has been widely applied. The Black-Scholes model has also been extended to more general parametric models for the process of stock prices.

In risk management the p-quantile of a loss distribution has a direct interpretation as such threshold that the probability of the loss exceeding the threshold is less than p. Thus estimation of conditional quantiles is directly relevant for risk management. Unconditional quantile estimators do not take into account all available information, and thus in risk management it is useful to estimate conditional quantiles. The estimation of the conditional variance can be applied in the estimation of a conditional quantile, because in location-scale families the variance determines the quantiles. The estimation of conditional variance can be extended to the estimation of the conditional covariance or the conditional correlation.

We apply nonparametric regression function estimation in portfolio selection. The portfolio is selected either with the maximization of a conditional expected utility or with the maximization of a Markowitz criterion. When the collection of allowed portfolio weights is a finite set, then also classification can be used in portfolio selection. The squared returns are much easier to predict than the returns themselves, and thus in quantitative finance the focus has been in the prediction of volatility. However, it can be shown that despite the weak predictability of the returns, portfolio selection can profit from statistical prediction.

Option pricing can be formulated as a problem of stochastic control. We do not study the statistics of option pricing in detail, but give a basic framework for solving some option pricing problems nonparametrically.

I.3 VISUALIZATION

Statistical visualization is often considered as a visualization of the raw data. The visualization of the raw data can be a part of the exploratory data analysis, a first step to model building, and a tool to generate hypotheses about the data-generating mechanism. However, we put emphasis on a different approach to visualization. In this approach, visualization tools are associated with statistical estimators or inference procedures. For example, we estimate first a regression function and then try to visualize and describe the properties of this regression function estimate. The distinction between the visualization of the raw data and the visualization of the estimator is not clear when nonparametric function estimation is used. In fact, nonparametric function estimation can be seen as a part of exploratory data analysis.

The SiZer is an example of a tool that combines visualization and inference, see Chaudhuri & Marron (1999). This methodology combines formal testing for the existence of modes with the SiZer maps to find out whether a mode of a density estimate of a regression function estimate is really there.

Semiparametric function estimates are often easier to visualize than nonparametric function estimates. For example, in a single index model the regression function estimate is a composition of a linear function and a univariate function. Thus in a single index model we need only to visualize the coefficients of the linear function and a one-dimensional function. The ease of visualization gives motivation to study semiparametric methods.

CART, as presented in Breiman, Friedman, Olshen & Stone (1984), is an example of an estimation method whose popularity is not only due to its statistical properties but also because it is defined in terms of a binary tree that gives directly a visualization of the estimator. Even when it is possible to find estimators with better statistical properties than CART, the possibility to visualization gives motivation to use CART.

Visualization of nonparametric function estimates, such as kernel estimates, is challenging. For the visualization of completely nonparametric estimates, we can use level set tree-based methods, as presented in Klemelä (2009). Level set tree-based methods have found interest also in topological data analysis and in scientific visualization, and these methods have their origin in the concept of a Reeb graph, defined originally in Reeb (1946).

In density estimation we are often interested in the mode structure of the density, defined as the number of local extremes, the largeness of the local extremes, and the location of the local extremes. The local extremes of a density function are related to the areas of concentration of the probability mass. In regression function estimation we are also interested in the mode structure. The local maxima of a regression function are related to the regions of the space of the explanatory variables where the response variable takes the largest values. The antimode structure is equally important to describe. The antimode structure means the number of local minima, the size of the local minima, and the location of the local minima. The local minima of a regression function are related to the areas of the space of the explanatory variables where the response variable takes the smallest values.

The mode structure of a regression function does not give complete information about the properties of the regression function. In regression analysis we are interested in the effects of the explanatory variables on the response variable and in the interaction between the explanatory variables. The effect of an explanatory variable can be formalized with the concept of a partial effect. The partial effect of an explanatory variable is the partial derivative of the regression function with respect to this variable. Nearly constant partial effects indicate that the regression function is close to a linear function, since the partial derivatives of a linear function are constants. The local maxima of a partial effect correspond to the areas in the space of the explanatory variables where the increase of the expected value of the response variable, resulting from an increase of the value of the explanatory variable, is the largest. We can use level set trees of partial effects to visualize the mode structure and the antimode structure of the partial effects, and thus to visualize the effects and the interactions of the explanatory variables.

I.4 LITERATURE

We mention some of the books that have been used in the preparation of this book. Härdle (1990) covers nonparametric regression with an emphasis on kernel regression, discussing smoothing parameter selection, giving confidence bands, and providing various econometric examples. Hastie, Tibshirani & Friedman (2001) describe high-dimensional linear and nonlinear classification and regression methods, giving many examples from biometry and machine learning. Györfi, Kohler, Krzyzak & Walk (2002) cover asymptotic theory of kernel regression, nearest-neighbor regression, empirical risk minimization, and orthogonal series methods, and they also include a treatment of time series prediction. Ruppert, Wand & Carroll (2003) view nonparametric regression as an extension of parametric regression and treat them together. Härdle, Müller, Sperlich & Werwatz (2004) explain single index models, generalized partial linear models, additive models, and several nonparametric regression function estimators, giving econometric examples. Wooldridge (2005) provides an asymptotic theory of linear regression, including instrumental variables and panel data. Fan & Yao (2005) study nonlinear time series and use nonparametric function estimation in time series prediction and explanation. Wasserman (2005) provides information on nonparametric regression and density estimation with confidence intervals and bootstrap confidence intervals. Horowitz (2009) covers semiparametric models and discusses the identifiability and asymptotic distributions. Spokoiny (2010) introduces local parametric methods into nonparametric estimation.

Bouchaud & Potters (2003) have developed nonparametric techniques for financial analysis. Franke, Härdle &. Hafner (2004) discuss statistical analysis of financial markets, with emphasis being on the parametric methods. Ruppert (2004) is a textbook suitable for statistics students interested in quantitative finance, and this book discusses statistical tools related to classical financial models. Malevergne & Sornette (2005) have analyzed financial data with nonparametric methods. Li & Racine (2007) consider various non- and semiparametric regression models presenting asymptotic distribution theory and the theory of smoothing parameter selection, directing towards econometric applications.

PART I

METHODS OF REGRESSION AND CLASSIFICATION

CHAPTER 1

OVERVIEW OF REGRESSION AND CLASSIFICATION

1.1 REGRESSION

In regression analysis we are interested in prediction or in inferring causal relationships. We try to predict the value of a response variable given the values of explanatory variables or try to deduce the causal influence of the explanatory variables to the response variable. The inference of a causal relationship is important when we want to to change the values of an explanatory variable in order to get an optimal value for the response variable. For example, we want to know the influence of education to the employment status of a worker in order to choose the best education. On the other hand, prediction is applied also in the cases when we are not able to, or do not wish to, change the values of the response variable. For example, in volatility prediction it is reasonable to use any variables that have a predictive relevance even if these variables do not have any causal relationship to volatility.

Both in prediction and in estimation of causal influence, it is useful to estimate the conditional expectation

of the response variable YR given the explanatory variables XRd. The choice of the explanatory variables and the method of estimation can depend on the purpose of the research. In prediction an explanatory variable can be any variable that has predictive relevance whereas in the estimation of a causal influence the explanatory variables are determined by the scientific theory about the causal relationship. For the purpose of causal inference, it is reasonable to choose an estimation method that can help to find the partial effect of a given explanatory variable to the response variable. The partial effect is defined in Section 1.1.3.

In linear regression the regression function estimate is a linear function:

(1.1)

A different type of linearity occurs, if the estimator can be written as

(1.2)

In addition to the estimation of the conditional expectation of the response variable given the explanatory variables, we can consider also the estimation of the conditional median of the response variable given the explanatory variables, or the estimation of other conditional quantiles of the response variable given the explanatory variables, which is called quantile regression. Furthermore, we will consider estimation of the conditional variance, as well as estimation of the conditional density and the conditional distribution function of the response variable given the explanatory variables.

In regression analysis the response variable can take any real value or any value in a given interval, but we consider also classification. In classification the response variable can take only a finite number of distinct values and the interest lies in the prediction of the values of the response variable.

1.1.1 Random Design and Fixed Design

Random Design Regression In random design regression the data are a sequence of n pairs

(1.3)

(1.4)

However, sometimes we do not distinguish notationally a random variable and its realization, and the notation of (1.4) is used also in the place of notation (1.3) to denote a realization of the random vectors and not the random vectors themselves.

In regression analysis we typically want to estimate the conditional expectation

Fixed Design Regression In fixed design regression the data are a sequence

Now the design points are not chosen by a random mechanism, but they are chosen by the conducter of the experiment. Typical examples could be time series data, where xi is the time when the observation yi is recorded, and spatial data, where xi is the location where the observation yi is made. Time series data are discussed in Section 1.1.9.

We model the data as a sequence of random variables

In the fixed design regression we typically do not assume that the data are identically distributed. For example, we may assume that

1.1.2 Mean Regression

The regression function is typically defined as a conditional expectation. Besides expectation and conditional expectation also median and conditional median can be used to characterize the center of a distribution and thus to predict and explain with the help of explanatory variables. We mention also the mode (maximum of the density function) as a third characterization of the center of a distribution, although the mode is typically not used in regression analysis.

Expectation and Conditional Expectation When the data

are a sequence of identically distributed random variables, we can use the data to estimate the regression function, defined as the conditional expectation of Y given X:

(1.5)

The mean of random variable YR with a continuous distribution can be defined by

(1.6)

where fY : R → R is the density function of Y. The regression function has been defined in (1.5) as the conditional mean of Y, and the conditional expectation can be defined in terms of the conditional density as

where the conditional density can be defined as

(1.7)

Figure 1.1Mean regression. (a) A scatter plot of regression data. (b) A contour plot of the estimated joint density of the explanatory variable and the response variable. The linear regression function estimate is shown with red and the kernel regression estimate is shown with blue.

Median and Conditional Median The median can be defined in the case of continuous distribution function of a random variable YR as the number median(Y) R satisfying

In general, covering also the case of discrete distributions, we can define the median uniquely as the generalized inverse of the distribution function:

(1.8)

The conditional median is defined using the conditional distribution of Y given X:

(1.9)

(1.10)

where Y(1) ≤ ··· ≤ Y(n) is the ordered sample and [x] is the largest integer smaller or equal to x.

Mode and Conditional Mode The mode is defined as an argument maximizing the density function of a random variable:

(1.11)

where fY : R → R is the density function of Y. The density fY can have several local maxima, and the use of the mode seems to be interesting only in cases where the density function is unimodal (has one local maximum). The conditional mode is defined as an argument maximizing the conditional density:

1.1.3 Partial Effects and Derivative Estimation

so that the partial effect is a function of x1 which is different for each x2, …., xd. Single index models are studied in Section 4.1.

The partial elasticity of X1 is defined as

We can use the visualization of partial effects as a tool to visualize regression functions. In Section 7.4 we show how level set trees can be used to visualize the mode structure of functions. The mode structure of a function means the number, the largeness, and the location of the local maxima of a function. Analogously, level set trees can be used to visualize the antimode structure of a function, where the antimode structure means the number, the largeness, and the location of the local minima of a function. Local maxima and minima are important characteristics of a regression function. However, we need to know more about a regression function than just the mode structure or antimode structure. Partial effects are a useful tool to convey additional important information about a regression function. If the partial effect is flat for each variable, then we know that the regression function is close to a linear function. When we visualize the mode structure of the partial effect of variable X1, then we get information about whether a variable X1 is causing the expected value of the response variable to increase in several locations (the number of local maxima of the partial effect), how much an increase of the value of the variable X1 increases the expected value of the response variable Y (the largeness of the local maxima of the partial effect), and where the influence of the response variable X1 is the largest (the location of the local maxima of the partial effect). Analogous conclusions can be made by visualizing the antimode structure of the partial effect.

We present two methods for the estimation of partial effects. The first method is to use the partial derivatives of a kernel regression function estimator, and this method is presented in Section 3.2.9. The second method is to use a local linear estimator, and this method is presented in Section 5.2.1.

1.1.4 Variance Regression

The mean regression gives information about the center of the conditional distribution, and with the variance regression we get information about the dispersion and on the heaviness of the tails of the conditional distribution. Variance is a classical measure of dispersion and risk which is used for example in the Markowitz theory of portfolio selection. Partial moments are risk measures that generalize the variance.

Variance and Conditional Variance The variance of random variable Y is defined by

(1.12)

The standard deviation of Y is the square root of the variance of Y. The conditional variance of random variable Y is equal to

(1.13)

(1.14)

The conditional standard deviation of Y is the square root of the conditional variance. The sample variance is defined by

where Y1, …, Yn is a sample of random variables having identical distribution with Y.

and estimate the conditional variance from the data (X1, 12), …, (Xn, n2).

Theory of variance estimation is often given in the fixed design case, but the results can be extended to the random design regression by conditioning on the design variables. Let us write a heteroskedastic fixed design regression model

(1.15)

Variance Estimation with Homoskedastic Noise Let us consider the fixed design regression model

Spokoiny (2002) showed that for twice differentiable regression functions f, the optimal rate for the estimation of σ2 is n−1/2 for d ≤ 8 and otherwise the optimal rate is n−4/d. We can first estimate the mean function f by and then use

These types of estimators were studied by Müller & Stadtmüller (1987), Hall & Carroll (1989), Hall & Marron (1990), and Neumann (1994). Local polynomial estimators were studied by Ruppert, Wand, Hoist & Hössjer (1997), and Fan & Yao (1998). A difference-based estimator was studied by von Neumann (1941). He used the estimator

where it is assumed that x1, …, xnR, and x1 < ··· < xn. The estimator was studied and modified in various ways in Rice (1984), Gasser, Sroka & Jennen-Steinmetz (1986), Hall, Kay & Titterington (1990), Hall, Kay & Titterington (1991), Thompson, Kay & Titterington (1991), and Munk, Bissantz, Wagner & Freitag (2005).

(1.16)

(1.17)

where t−1 is the sigma-algebra generated by variables Yt−1, Yt−2, …. In a conditional heteroskedasticity model the main interest is in predicting the value of the random variable σt2, which is thus related to estimating the conditional variance. The statistical problem is to predict σt2 using a finite number of past observations Y1, …, Yt−1. Special cases of conditional heteroskedasticity models are the ARCH model discussed in Section 2.5.2 and the GARCH model discussed in Section 3.9.2.

and the lower partial moment is defined as

(1.18)

The square root of the lower semivariance can be used to replace the standard deviation in the definition of the Sharpe ratio or in the Markowitz criterion. We can define conditional versions of partial moments by changing the expectations to conditional expectations.

1.1.5 Covariance and Correlation Regression

The covariance of random variables Y and Z is defined by

The sample covariance is defined by

The correlation is defined by

where sd(Y) and sd(Z) are the standard deviations of Y and Z. The conditional correlation is defined by

(1.19)

where

We can write

(1.20)

where

Thus we have two approaches to the estimation of conditional correlation.

and the autocorrelation is defined by

(1.21)

(1.22)

1.1.6 Quantile Regression

A quantile generalizes the median. In quantile regression a conditional quantile is estimated. Quantiles can be used to measure the value at risk (VaR). The expected shortfall is a related measure of dispersion and risk.

Quantile and Conditional Quantile The pth quantile is defined as

(1.23)

and thus it holds that

(1.24)

where 0 < p < 1. Conditional quantile estimation has been considered in Koenker (2005) and Koenker & Bassett (1978).

Estimation of a Quantile and a Conditional Quantile Estimation of quantiles is closely related to the estimation of the distribution function. It is usually possible to derive a method for the estimation of a quantile or a conditional quantile if we have a method for the estimation of a distribution function or a conditional distribution function.

Empirical Quantile Let us define the empirical distribution function, based on the dta Y1, …, Yn, as

(1.25)

Now we can define an estimate of the quantile by

(1.26)

where 0 < p < 1. Now it holds that

(1.27)

where the ordered sample is denoted by Y(1) ≤ Y(2) < ··· ≤ Y(n). A third description of the empirical estimator of the quantile is given by the following steps:

Standard Deviation-Based Quantile Estimators We can also use an estimate of the standard deviation to derive an estimate for a quantile. Namely, consider the location-scale model

where μ R, σ > 0, and is a random variable with a continuous distribution. Now

Thus, for a known F, we get from the estimates of μ and of σ the estimate

(1.28)

Standard Deviation-Based Conditional Quantile Estimators To get an estimate for a conditional quantile in the heteroskedastic fixed design model (1.15), we can use

(1.29)

Similarly, in the conditional heteroskedasticity model (1.16) we can use

(1.30)

We apply in Section 2.5.1 and in Section 3.11.3 three quantile estimators which are based on the standard deviation estimates.

Expected Shortfall The expected shortfall is a measure of risk which aggregates all quantiles in the right tail (or in the left tail). The expected shortfall for the right tail is defined as

When Y has a continuous distribution function, then

(1.34)

see McNeil, Frey & Embrechts (2005, lemma 2.16). We have defined the loss in (1.86) as the negative of the change in the value of the portfolio, and thus the risk management wants to control the right tails of the loss distribution However, we can define the expected shortfall for the left tail as

(1.35)

When Y has a continuous distribution function, then

This expression shows that in the case of a continuous distribution function, pESp(Y) is equal to the expectation which is taken only over the left tail, when the left tail is defined as the region which is to the left of a quantile of the distribution.2

The expected shortfall can be estimated from the data Y1, …, Yn in the case where the expected shortfall is given in (1.34) by using

Let us consider the location-scale model

where μ R, σ > 0, and is a random variable with a continuous distribution. Now

Thus the estimate for the expected shortfall can be obtained as

where , is an estimate of μ and σ is an estimate of σ.

If ~ N(0, 1) and the expected shortfall is defined for the right tail as in (1.34), then

where ϕ is the density function of the standard normal distribution and Φ is the distribution function of the standard normal distribution. If ~ tυ, where tυ is the t-distribution with υ degrees of freedom, and the expected shortfall is defined for the right tail as in (1.34), then

where gυ is the density function of the t-distribution with υ degrees of freedom and tυ is the distribution function of the t-distribution with υ degrees of freedom.

1.1.7 Approximation of the Response Variable

We have defined the regression function in (1.5) as the conditional expectation of the response variable. The conditional expectation can be viewed as an approximation of response variable YR with the help of explanatory random variables X1, …, XdR. The approximation is a random variable f(X1, …, Xd) R, where f : Rd → R is a fixed function. This viewpoint leads to generalizations. The best approximation of the response variable can be defined using various loss functions ρ : R → R. The best approximation is f(X1, …, Xd), where f is defined as

(1.36)

where is a suitable class of functions g : Rd → R. Since f is defined in terms of the unknown distribution of (X, Y), we have to estimate f using statistical data available from the distribution of (X, Y).

Examples of Loss Functions We give examples of different choices of ρ and .

Estimation Using Loss Function If a regression function can be characterized as a minimizer of a loss function, then we can use empirical risk minimization with this loss function to define an estimator for the regression function. Empirical risk minimization is discussed in Chapter 5.

where is a class of functions f : Rd → R. For example, could be the class of linear functions.

Estimation of quantiles and conditional quantiles can also be done using empirical risk minimization. The estimator of the pth quantile is

where is a class of functions f : Rd → R. A further idea which we will discuss in Section 5.2 is to define an estimator for the conditional quantile using local empirical risk:

1.1.8 Conditional Distribution and Density

Instead of estimating only conditional expectation, conditional variance, or conditional quantile, we can try to estimate the complete conditional distribution by estimating the conditional distribution function or the conditional density function.

Conditional Distribution Function The distribution function of random variable YR is defined as4

The conditional distribution function is defined as

where YR is a scalar random variable and XRd is a random vector. We have

(1.42)

and thus the estimation of the conditional distribution function can be considered as a regression problem, where the conditional expectation of the random variable I(−∞,y)(Y) takes is estimated. The random variable I(−∞,y) takes only values 0 or 1. The unconditional distribution function can be estimated with the empirical distribution function, which is defined for the data Y1, …, Yn as

(1.43)

where #A means the cardinality of set A. The conditional distribution function estimation is considered in Section 3.7, where local averaging estimators are defined.

Conditional Density Conditional density function is defined as

for yR, where fX,Y : Rd+1 → R is the joint density of (X, Y) and fX : Rd → R is the density of X. We mention three ways to estimate the conditional density.

First, we can replace the density of (X, Y) and the density of X with their estimators X,Y and X and define

for X(x) > 0. This approach is close to the approach used in Section 3.6, where local averaging estimators of the conditional density are defined.

Second, empirical risk minimization can be used in the estimation of the conditional density, as explained in Section 5.1.3.

Third, sometimes it is reasonable to assume that the conditional density has the form

(1.44)

where fθ, θ A ⊂ Rk, is a family of density functions and g : Rd → A, where k ≥ 1. Then the estimation of the conditional density reduces to the estimation of the “regression function” g. The mean regression is a special case of this approach when the distribution of errors is known: Assume that

1.1.9 Time Series Data

Regression data are a sequence (X1, Y1), …, (Xn, Yn) of identically distributed copies of (X, Y), where XRd is the explanatory variable and YR is the response variable, as we wrote in (1.4). However, we can use regression methods with time series data

State–Space Prediction In the state–space prediction an autoregression parameter k ≥ 1 is chosen and we denote

(1.45)

We define the regression function, as previously, by

(1.46)

Let

be a d-dimensional vector time series. Definition (1.45) generalizes to the setting of vector time series. Define

(1.47)

The regression function is now defined on the higher-dimensional space of dimension kd.

We can predict and explain without autoregression parameter k and take into account all the previous observations and not just the k last observations. However, this approach does not fit into the standard regression approach. Let Z1, …, ZTR be a scalar time series and define

Time–Space Prediction In time–space prediction the time parameter is taken as the explanatory variable, in contrast to (1.45), where the previous observations in the time series are taken as the explanatory variables. We denote

(1.48)

The obtained regression model is a fixed design regression model, as described in Section 1.1.1.

Time–space prediction can be used when the time series can be modeled as a nonstationary time series of signal with additive noise:

(1.49)

where μiR is the deterministic signal, σi > 0 are nonrandom values, and the noise i is stationary with mean zero and unit variance. For statistical estimation and asymptotic analysis we can use a slightly different model

(1.50)

1.1.10 Stochastic Control

We consider two types of stochastic control problems. The first type of stochastic control problem appears in option pricing and hedging and the second type of stochastic control problem appears in portfolio selection. The connection of these stochastic control problems to portfolio selection and to option pricing and hedging are explained in Section 1.5.3 and in Section 1.5.4, respectively.

Option-Pricing-Type Stochastic Control Consider the time series

is minimized. The optimal coefficients at time t0 are defined by

(1.51)

where the minimization is done over coefficients βt at time t and over coefficient αt0 at time t0.

Note that at time t0 the coefficients βt0+1, …, βT−1 are nuisance coefficients since they are chosen at later times, and at time t0 we use them only to calculate the optimal values αt0o and βt0o. Then, at time t0 + 1 we choose parameters αt0+1 and βt0+1 and parameters βt0+2, …, βT−1 are nuisance parameters at time t0 + 1.

Note the difference to the usual least squares problem. Namely, in the usual least squares problem we solve the problem

at time T − 1. That is, all coefficients are chosen at the same time T − 1 and at that time all values Xt0, … XT−1 are known. This problem appears for example in the linear autoregression, where we minimize the expected squared error

at time T − 1. In the one-step case the stochastic control and the usual least squares problem are identical, because in the one-step problem we minimize

at time t0 − T − 1.

where

The connection of this type of stochastic control problem to option pricing is explained in Section 1.5.4.

Portfolio-Selection-Type Stochastic Control Consider the time series

is maximized, where u : R → R. The optimal coefficients at time t0 are defined by

(1.52)

where the maximization is done over vector βt at time t. The connection of this type of stochastic control problem to portfolio selection is explained in Section 1.5.3, see (1.97).

1.1.11 Instrumental Variables

The method of instrumental variables is used to estimate causal relationships when it is not possible to make controlled experiments. There are three classical examples of the cases where a need for instrumental variables arises: when there are relevant explanatory variables which are not observed (omitted variables), when the explanatory variables are subject to measurement errors, or when the response variable has a causal influence on one of the explanatory variables (reverse causation).

The method of instrumental variables can be used when we want to estimate structural function g : Rd → R in the model

(1.53)

where YR, XRd, and

(1.54)

We give two examples of model (1.53). The first example explains how an omitted variable can lead to (1.53). The second example explains how an error in the explanatory variable can lead to (1.53).

Omitted Variable As an example of a case where model (1.53) can arise, consider the case where X is a variable indicating the type of the treatment a patient receives:

and Y is a variable measuring the health of the patient after receiving the treatment. This example is modeled after McClellan, McNeil & Newhouse (1994). We want to estimate the causal influence of X on Y. Let us denote with W the random variable measuring the health of a patient at the time the patient receives the treatment. Also the variable W is influencing Y. In this example W is also affecting X, because the decision about the treatment a patient receives is partially based on the health condition of the patient (if the patient is weak, he will not receive a treatment that is physiologically demanding). Using usual regression methods and observations of X and Y would give a biased estimate of the causal influence of X on Y. (If patients with a weak condition receive treatment A more often, then the estimate would give a pessimistic estimate of the effect of treatment A.)

We assume an additive model

Errors in Variables in a Linear Model As an example of model (1.53), consider the case where the linear model

Thus the observed values Xi are contaminated with additive errors. We assume that

(1.55)

and

(1.56)

We can write the observed response variables as

and the new error term is denoted by

to get the new linear model

(1.57)

The fact E(U | X) ≠ 0, follows from Cov(X, U) ≠ 0. We have that

because

Estimation of the Structural Function We give a linear instrumental variables estimator in (2.24). This estimator can be used to estimate parameters α and β in (1.57). The linear instrumental variable estimator is

where

Hall & Horowitz (2005) approach the estimation of g(x) in the model (1.53) by deriving an operator equation for g. From (1.54) we obtain

where the operator K is defined as

1.2 DISCRETE RESPONSE VARIABLE