Smoothing and Regression -  - E-Book

Smoothing and Regression E-Book

0,0
146,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

A comprehensive introduction to a wide variety of univariate and multivariate smoothing techniques for regression Smoothing and Regression: Approaches, Computation, and Application bridges the many gaps that exist among competing univariate and multivariate smoothing techniques. It introduces, describes, and in some cases compares a large number of the latest and most advanced techniques for regression modeling. Unlike many other volumes on this topic, which are highly technical and specialized, this book discusses all methods in light of both computational efficiency and their applicability for real data analysis. Using examples of applications from the biosciences, environmental sciences, engineering, and economics, as well as medical research and marketing, this volume addresses the theory, computation, and application of each approach. A number of the techniques discussed, such as smoothing under shape restrictions or of dependent data, are presented for the first time in book form. Special features of this book include: * Comprehensive coverage of smoothing and regression with software hints and applications from a wide variety of disciplines * A unified, easy-to-follow format * Contributions from more than 25 leading researchers from around the world * More than 150 illustrations also covering new graphical techniques important for exploratory data analysis and visualization of high-dimensional problems * Extensive end-of-chapter references For professionals and aspiring professionals in statistics, applied mathematics, computer science, and econometrics, as well as for researchers in the applied and social sciences, Smoothing and Regression is a unique and important new resource destined to become one the most frequently consulted references in the field.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 1035

Veröffentlichungsjahr: 2013

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Contents

Cover

Half Title page

Title page

Copyright page

Contributors

Foreword

Preface

Chapter 1: Spline Regression

1.1 Introduction

1.2 General form of the Estimator

1.3 The Linear Smoothing Spline

1.4 Large-Sample Efficiency

1.5 Bayesian Motivation

1.6 Extensions and Implementations

References

Chapter 2: Variance Estimation and Smoothing-Parameter Selection for Spline Regression

2.1 Introduction and Some Definitions

2.2 Interpretation of the Smoothing Parameter

2.3 Quantifying the Complexity of a Smoothing Spline

2.4 Estimation of σ2

2.5 Determination of λ

2.6 Estimation of τ2

2.7 Final Remarks

References

Chapter 3: Kernel Regression

3.1 Introduction

3.2 The Nadaraya–Watson Kernel Regression Estimate

3.3 Pointwise Bias Properties of the Nadaraya–Watson Estimate

3.4 Pointwise Variance Properties of the Nadaraya–Watson Estimate

3.5 Trade-off Between Bias and Variance: The Mean Squared Error

3.6 Global Results: Mean Integrated Squared Error Properties

3.7 L∞ Convergence Properties of the Nadaraya–Watson Estimate

3.8 Complementary Bibliography

References

Chapter 4: Variance Estimation and Bandwidth Selection for Kernel Regression

4.1 Introduction

4.2 Nonparametric Variance Estimators

4.3 Bandwidth Choice for Kernel Regression Estimators

References

Chapter 5: Spline and Kernel Regression Under Shape Restrictions

5.1 Introduction

5.2 Description of the Main Methods

5.3 A Comparative View

5.4 Examples

5.5 Software Hints

References

Chapter 6: Spline and Kernel Regression for Dependent Data

6.1 Introduction

6.2 Approaches for a Known Autocorrelation Function

6.3 Approaches for an Unknown Autocorrelation Function

6.4 A Bayesian Approach to Smoothing Dependent Data

6.5 Applications of Smoothing Dependent Data

References

Chapter 7: Wavelets for Regression and Other Statistical Problems

7.1 Introduction

7.2 Wavelet Expansions

7.3 the Discrete Wavelet Transform in S

7.4 Wavelet Shrinkage

7.5 Estimators for Data With Correlated Noise

7.6 Implementation of the Wavelet Transform

7.7 How To Obtain and Install the Wavelet Software

References

Chapter 8: Smoothing Methods for Discrete Data

8.1 Introduction

8.2 Smoothing Contingency Tables

8.3 Smoothing Approaches to Categorical Regression

8.4 Conclusion

References

Chapter 9: Local Polynomial Fitting

9.1 Introduction

9.2 Properties of Local Polynomial Fitting

9.3 Choice of Bandwidth

9.4 Choice of the Degree

9.5 Local Modeling

9.6 Some More Applications

References

Chapter 10: Additive and Generalized Additive Models

10.1 Introduction

10.2 The Additive Model

10.3 Generalized Additive Models

10.4 Alternating Conditional Expectations, Additivity, and Variance Stabilization

10.5 Smoothing Parameter and Bandwidth Determination

10.6 Model Diagnostics

10.7 New Developments

References

Chapter 11: Multivariate Spline Regression

11.1 Introduction

11.2 Smoothing Splines as Bayes Estimates

11.3 ANOVA Decomposition on Product Domains

11.4 Tensor Product Splines

11.5 Computation

11.6 Bayesian Confidence Intervals

11.7 Software

11.8 Cosine Diagnostics

11.9 Partial Splines

11.10 Thin-Plate Splines

11.11 Non-Gaussian Regression

References

Chapter 12: Multivariate and Semiparametric Kernel Regression

12.1 Introduction

12.2 Multidimensional Smoothing with Kernels

12.3 Semiparametric Generalized Regression Models

12.4 Practical Application and Software Hints

References

Chapter 13: Spatial-Process Estimates as Smoothers

13.1 Introduction

13.2 Thin-Plate Splines

13.3 Spatial-Process Estimates

13.4 Ridge-Regression Estimates and Shrinkage

13.5 A Response-Surface Example

13.6 Predicting Ambient Ozone

13.7 Future Directions

References

Chapter 14: Resampling Methods for Nonparametric Regression

14.1 Introduction

14.2 The Idea of Bootstrap

14.3 Bootstrap in Nonparametric Regression

14.4 Bootstrap Confidence Intervals and Bands

14.5 Bootstrap-Bandwidth Choice

14.6 Bootstrap Tests in Nonparametrics

14.7 Bootstrap Inference on the Shape of a Curve

14.8 Extensions

References

Chapter 15: Multidimensional Smoothing and Visualization

15.1 Introduction

15.2 Data-Point Visualization

15.3 Functional Visualization in One and Two Variables

15.4 Averaged Shifted Histograms

15.5 Functional Visualization in Three Variables and Beyond

15.6 Visualization of Regression Functions

15.7 Summary

References

Chapter 16: Projection Pursuit Regression

16.1 Introduction

16.2 the Basic PPR Algorithm

16.3 Quality of Approximation

16.4 Number of Terms to Choose

16.5 Interpretable PPR

16.6 Convergence Rates

16.7 Modifications

16.8 PPR and Neural Networks

16.9 Optimization Methods For PPR and Neural Networks

16.10 The Implementation of PPR in S-PLUS, R, and Xplore

16.11 An Example

16.12 Conclusion

References

Chapter 17: Sliced Inverse Regression

17.1 Introduction

17.2 The Idea

17.3 Statistical Properties

17.4 The Unknown Dimensionality

17.5 Slicing Strategies

17.6 Implementation

17.7 Modifications

17.8 An Example

References

Chapter 18: Dynamic and Semiparametric Models

18.1 Introduction

18.2 Linear Dynamic Models and Optimal Smoothing for Time Series Data

18.3 Non-Gaussian Observation Models

18.4 Generalized Additive and Varying Coefficient Models

18.5 Conclusions

References

Chapter 19: Nonparametric Bayesian Bivariate Surface Estimation

19.1 Introduction

19.2 Bayesian Model and Subset Selection

19.3 Bivariate Surface Estimation

19.4 Robust Surface Estimation

19.5 Surface Estimation for Time Series Data

19.6 Alternative Bases and Model Mixing

References

Index

Smoothing and Regression

WILEY SERIES IN PROBABILITY AND STATISTICS

ESTABLISHED BY WALTER A. SHEWHART AND SAMUEL S. WILKS

Editors

Noel A. C. Cressie. Nicholas I. Fisher, Iain M. Johnstone, J. B. Kadane, David W. Scott, Bernard W. Silverman, Adrian F. M. Smith, Jozef L. Teugels; Vic Barnett, Emeritus, Ralph A. Bradley, Emeritus, J. Stuart Hunter, Emeritus, David G. Kendall, Emeritus

Probability and Statistics Section

∗ANDERSON • The Statistical Analysis of Time Series

ARNOLD, BALAKRISHNAN, and NAGARAJA • A First Course in Order Statistics

ARNOLD, BALAKRISHNAN, and NAGARAJA • Records

BACCELLI, COHEN, OLSDER, and QUADRAT • Synchronization and Linearity: An Algebra for Discrete Event Systems

BARNETT • Comparative Statistical Inference, Third Edition

BASILEVSKY • Statistical Factor Analysis and Related Methods: Theory and Applications

BERNARDO and SMITH • Bayesian Statistical Concepts and Theory

BILLINGSLEY • Convergence of Probability Measures, Second Edition

BOROVKOV • Asymptotic Methods in Queuing Theory

BOROVKOV • Ergodicity and Stability of Stochastic Processes

BRANDT, FRANKEN, and LISEK • Stationary Stochastic Models

CAINES • Linear Stochastic Systems

CAIROLI and DALANG • Sequential Stochastic Optimization

CONSTANTINE • Combinatorial Theory and Statistical Design

COOK • Regression Graphics

COVER and THOMAS • Elements of Information Theory

CSÖRGÖ and HORVÁTH • Weighted Approximations in Probability Statistics

CSÖRGÖ and HORVÁTH • Limit Theorems in Change Point Analysis

∗DANIEL • Fitting Equations to Data: Computer Analysis of Multifactor Data, Second Edition

DETTE and STUDDEN • The Theory of Canonical Moments with Applications in Statistics, Probability, and Analysis

DEY and MUKERJEE • Fractional Factorial Plans

∗DOOB • Stochastic Processes

DRYDEN and MARDIA • Statistical Shape Analysis

DUPUIS and ELLIS • A Weak Convergence Approach to the Theory of Large Deviations

ETHIER and KURTZ • Markov Processes: Characterization and Convergence

FELLER • An Introduction to Probability Theory and Its Applications, Volume 1, Third Edition, Revised; Volume II, Second Edition

FULLER • Introduction to Statistical Time Series, Second Edition

FULLER • Measurement Error Models

GHOSH, MUKHOPADHYAY, and SEN • Sequential Estimation

GIFI • Nonlinear Multivariate Analysis

GUTTORP • Statistical Inference for Branching Processes

HALL • Introduction to the Theory of Coverage Processes

HAMPEL • Robust Statistics: The Approach Based on Influence Functions

HANNAN and DEISTLER • The Statistical Theory of Linear Systems

HUBER • Robust Statistics

HUSKOVA, BERAN, and DUPAC • Collected Works of Jaroslav Hajek—with Commentary

IMAN and CONOVER • A Modern Approach to Statistics

JUREK and MASON • Operator-Limit Distributions in Probability Theory

KASS and VOS • Geometrical Foundations of Asymptotic Inference

KAUFMAN and ROUSSEEUW • Finding Groups in Data: An Introduction to Cluster Analysis

KELLY • Probability, Statistics, and Optimization

KENDALL, BARDEN, CARNE, and LE • Shape and Shape Theory

LINDVALL • Lectures on the Coupling Method

McFADDEN • Management of Data in Clinical Trials

MANTON, WOODBURY, and TOLLEY • Statistical Applications Using Fuzzy Sets

MORGENTHALER and TUKEY • Configural Polysampling: A Route to Practical Robustness

MUIRHEAD • Aspects of Multivariate Statistical Theory

OLIVER and SMITH • Influence Diagrams, Belief Nets and Decision Analysis

∗PARZEN • Modern Probability Theory and Its Applications

PRESS • Bayesian Statistics: Principles, Models, and Applications

PUKELSHEIM • Optimal Experimental Design

RAO • Asymptotic Theory of Statistical Inference

RAO • Linear Statistical Inference and Its Applications, Second Edition

RAO and SHANBHAG • Choquet-Deny Type Functional Equations with Applications to Stochastic Models

ROBERTSON, WRIGHT, and DYKSTRA • Order Restricted Statistical Inference

ROGERS and WILLIAMS • Diffusions, Markov Processes, and Martingales, Volume I: Foundations, Second Edition; Volume II: Ito Calculus

RUBINSTEIN and SHAPIRO • Discrete Event Systems: Sensitivity Analysis and Stochastic Optimization by the Score Function Method

RUZSA and SZEKELY • Algebraic Probability Theory

SCHEFFE • The Analysis of Variance

SEBER • Linear Regression Analysis

SEBER • Multivariate Observations

SEBER and WILD • Nonlinear Regression

SERFLING • Approximation Theorems of Mathematical Statistics

SHORACK and WELLNER • Empirical Processes with Applications to Statistics

SMALL and McLEISH • Hilbert Space Methods in Probability and Statistical Inference

STAPLETON • Linear Statistical Models

STAUDTE and SHEATHER • Robust Estimation and Testing

STOYANOV • Counterexamples in Probability

TANAKA • Time Series Analysis: Nonstationary and Noninvertible Distribution Theory

THOMPSON and SEBER • Adaptive Sampling

WELSH • Aspects of Statistical Inference

WHITTAKER • Graphical Models in Applied Multivariate Statistics

YANG • The Construction Theory of Denumerable Markov Processes

Applied Probability and Statistics Section

ABRAHAM and LEDOLTER • Statistical Methods for Forecasting

AGRESTI • Analysis of Ordinal Categorical Data

AGRESTI • Categorical Data Analysis

ANDERSON, AUQUIER, HAUCK, OAKES, VANDAELE, and WEISBERG • Statistical Methods for Comparative Studies

ARMITAGE and DAVID (editors) • Advances in Biometry

∗ARTHANARI and DODGE • Mathematical Programming in Statistics

ASMUSSEN • Applied Probability and Queues

∗BAILEY • The Elements of Stochastic Processes with Applications to the Natural Sciences

BARNETT and LEWIS • Outliers in Statistical Data, Third Edition

BARTHOLOMEW, FORBES, and McLEAN • Statistical Techniques for Manpower Planning, Second Edition

BASU and RIGDON • Statistical Methods for the Reliability of Repairable Systems

BATES and WATTS • Nonlinear Regression Analysis and Its Applications

BECHHOFER, SANTNER, and GOLDSMAN • Design and Analysis of Experiments for Statistical Selection, Screening, and Multiple Comparisons

BELSLEY • Conditioning Diagnostics: Collinearity and Weak Data in Regression

BELSLEY, KUH, and WELSCH • Regression Diagnostics: Identifying Influential Data and Sources of Collinearity

BHAT • Elements of Applied Stochastic Processes, Second Edition

BHATTACHARYA and WAYMIRE • Stochastic Processes with Applications

BIRKES and DODGE • Alternative Methods of Regression

BLISCHKE AND MURTHY • Reliability: Modeling, Prediction, and Optimization

BLOOMFIELD • Fourier Analysis of Time Series: An Introduction, Second Edition

BOLLEN • Structural Equations with Latent Variables

BOULEAU • Numerical Methods for Stochastic Processes

BOX • Bayesian Inference in Statistical Analysis

BOX and DRAPER • Empirical Model-Building and Response Surfaces

∗BOX and DRAPER • Evolutionary Operation: A Statistical Method for Process Improvement

BUCKLEW • Large Deviation Techniques in Decision, Simulation, and Estimation

BUNKE and BUNKE • Nonlinear Regression, Functional Relations and Robust Methods: Statistical Methods of Model Building

CHATTERJEE and HADI • Sensitivity Analysis in Linear Regression

CHERNICK • Bootstrap Methods: A Practitioner’s Guide

CHILÈS and DELFINER • Geostatistics: Modeling Spatial Uncertainty

CHOW and LIU • Design and Analysis of Clinical Trials: Concepts and Methodologies

CLARKE and DISNEY • Probability and Random Processes: A First Course with Applications, Second Edition

∗COCHRAN and COX • Experimental Designs, Second Edition

CONOVER • Practical Nonparametric Statistics, Second Edition

CORNELL • Experiments with Mixtures, Designs, Models, and the Analysis of Mixture Data, Second Edition

∗COX • Planning of Experiments

CRESSIE • Statistics for Spatial Data, Revised Edition

DANIEL • Applications of Statistics to Industrial Experimentation

DANIEL • Biostatistics: A Foundation for Analysis in the Health Sciences, Sixth Edition

DAVID • Order Statistics, Second Edition

∗DEGROOT, FIENBERG, and KADANE • Statistics and the Law

DODGE • Alternative Methods of Regression

DOWDY and WEARDEN • Statistics for Research, Second Edition

DUNN and CLARK • Applied Statistics: Analysis of Variance and Regression, Second Edition

∗ELANDT-JOHNSON and JOHNSON • Survival Models and Data Analysis

∗FLEISS • The Design and Analysis of Clinical Experiments

FLEISS • Statistical Methods for Rates and Proportions, Second Edition

FLEMING and HARRINGTON • Counting Processes and Survival Analysis

GALLANT • Nonlinear Statistical Models

GLASSERMAN and YAO • Monotone Structure in Discrete-Event Systems

GNANADESIKAN • Methods for Statistical Data Analysis of Multivariate Observations, Second Edition

GOLDSTEIN and LEWIS • Assessment: Problems, Development, and Statistical Issues

GREENWOOD and NIKULIN • A Guide to Chi-Squared Testing

∗HAHN • Statistical Models in Engineering

HAHN and MEEKER • Statistical Intervals: A Guide for Practitioners

HAND • Construction and Assessment of Classification Rules

HAND • Discrimination and Classification

HEIBERGER • Computation for the Analysis of Designed Experiments

HEDAYAT and SINHA • Design and Inference in Finite Population Sampling

HINKELMAN and KEMPTHORNE: Design and Analysis of Experiments, Volume 1: Introduction to Experimental Design

HOAGLIN, MOSTELLER, and TUKEY • Exploratory Approach to Analysis of Variance

HOAGLIN, MOSTELLER, and TUKEY • Exploring Data Tables, Trends and Shapes

HOAGLIN, MOSTELLER, and TUKEY • Understanding Robust and Exploratory Data Analysis

HOCHBERG and TAMHANE • Multiple Comparison Procedures

HOCKING • Methods and Applications of Linear Models: Regression and the Analysis of Variables

HOGG and KLUGMAN • Loss Distributions

HOSMER and LEMESHOW • Applied Logistic Regression

HØYLAND and RAUSAND • System Reliability Theory: Models and Statistical Methods

HUBERTY • Applied Discriminant Analysis

HUNT and KENNEDY • Financial Derivatives in Theory and Practice

JACKSON • A User’s Guide to Principle Components

JOHN • Statistical Methods in Engineering and Quality Assurance

JOHNSON • Multivariate Statistical Simulation

JOHNSON and KOTZ • Distributions in Statistics

JOHNSON, KOTZ, and BALAKRISHNAN • Continuous Univariate Distributions, Volume 1, Second Edition

JOHNSON, KOTZ, and BALAKRISHNAN • Continuous Univariate Distributions, Volume 2, Second Edition

JOHNSON, KOTZ, and BALAKRISHNAN • Discrete Multivariate Distributions

JOHNSON, KOTZ, and KEMP • Univariate Discrete Distributions, Second Edition

JUREKOVÁ and SEN • Robust Statistical Procedures: Aymptotics and Interrelations

KADANE • Bayesian Methods and Ethics in a Clinical Trial Design

KADANE AND SCHUM • A Probabilistic Analysis of the Sacco and Vanzetti Evidence

KALBFLEISCH and PRENTICE • The Statistical Analysis of Failure Time Data

KELLY • Reversability and Stochastic Networks

KHURI, MATHEW, and SINHA • Statistical Tests for Mixed Linear Models

KLUGMAN, PANJER, and WILLMOT • Loss Models: From Data to Decisions

KLUGMAN, PANJER, and WILLMOT • Solutions Manual to Accompany Loss Models: From Data to Decisions

KOTZ, BALAKRISHNAN, and JOHNSON • Continuous Multivariate Distributions, Volume 1, Second Edition

KOVALENKO, KUZNETZOV, and PEGG • Mathematical Theory of Reliability of Time-Dependent Systems with Practical Applications

LACHIN • Biostatistical Methods: The Assessment of Relative Risks

LAD • Operational Subjective Statistical Methods: A Mathematical, Philosophical, and Historical Introduction

LANGE, RYAN, BILLARD, BRILLINGER, CONQUEST, and GREENHOUSE • Case Studies in Biometry

LAWLESS • Statistical Models and Methods for Lifetime Data

LEE • Statistical Methods for Survival Data Analysis, Second Edition

LEPAGE and BILLARD • Exploring the Limits of Bootstrap

LINHART and ZUCCHINI • Model Selection

LITTLE and RUBIN • Statistical Analysis with Missing Data

LLOYD • The Statistical Analysis of Categorical Data

MAGNUS and NEUDECKER • Matrix Differential Calculus with Applications in Statistics and Econometrics, Revised Edition

MALLER and ZHOU • Survival Analysis with Long Term Survivors

MANN, SCHAFER, and SINGPURWALLA • Methods for Statistical Analysis of Reliability and Life Data

McLACHLAN and KRISHNAN • The EM Algorithm and Extensions

McLACHLAN • Discriminant Analysis and Statistical Pattern Recognition

McNEIL • Epidemiological Research Methods

MEEKER and ESCOBAR • Statistical Methods for Reliability Data

∗MILLER • Survival Analysis, Second Edition

MONTGOMERY and PECK • Introduction to Linear Regression Analysis, Second Edition

MYERS and MONTGOMERY • Response Surface Methodology: Process and Product in Optimization Using Designed Experiments

NELSON • Accelerated Testing, Statistical Models, Test Plans, and Data Analyses

NELSON • Applied Life Data Analysis

OCHI • Applied Probability and Stochastic Processes in Engineering and Physical Sciences

OKABE, BOOTS, and SUGIHARA • Spatial Tesselations: Concepts and Applications of Voronoi Diagrams

PANKRATZ • Forecasting with Dynamic Regression Models

PANKRATZ • Forecasting with Univariate Box-Jenkins Models: Concepts and Cases

PIANTADOSI • Clinical Trials: A Methodologic Perspective

PORT • Theoretical Probability for Applications

PUTERMAN • Markov Decision Processes: Discrete Stochastic Dynamic Programming

RACHEV • Probability Metrics and the Stability of Stochastic Models

RÉNYI • A Diary on Information Theory

RIPLEY • Spatial Statistics

RIPLEY • Stochastic Simulation

ROLSKI, SCHMIDLI, SCHMIDT, and TEUGELS • Stochastic Processes for Insurance and Finance

ROUSSEEUW and LEROY • Robust Regression and Outlier Detection

RUBIN • Multiple Imputation for Nonresponse in Surveys

RUBINSTEIN • Simulation and the Monte Carlo Method

RUBINSTEIN and MELAMED • Modern Simulation and Modeling

RYAN • Statistical Methods for Quality Improvement, Second Edition

SCHIMEK • Smoothing and Regression: Approaches, Computation, and Application

SCHUSS • Theory and Applications of Stochastic Differential Equations

SCOTT • Multivariate Density Estimation: Theory, Practice, and Visualization

∗SEARLE • Linear Models

SEARLE • Linear Models for Unbalanced Data

SEARLE, CASELLA, and McCULLOCH • Variance Components

SENNOTT • Stochastic Dynamic Programming and the Control of Queueing Systems

STOYAN, KENDALL, and MECKE • Stochastic Geometry and Its Applications, Second Edition

STOYAN and STOYAN • Fractals, Random Shapes and Point Fields: Methods of Geometrical Statistics

THOMPSON • Empirical Model Building

THOMPSON • Sampling

THOMPSON • Simulation: A Modeler’s Approach

TIJMS • Stochastic Modeling and Analysis: A Computational Approach

TIJMS • Stochastic Models: An Algorithmic Approach

TITTERINGTON, SMITH, and MAKOV • Statistical Analysis of Finite Mixture Distributions

UPTON and FINGLETON • Spatial Data Analysis by Example, Volume 1: Point Pattern and Quantitative Data

UPTON and FINGLETON • Spatial Data Analysis by Example, Volume II: Categorical and Directional Data

VAN RIJCKEVORSEL and DE LEEUW • Component and Correspondence Analysis

VIDAKOVIC • Statistical Modeling by Wavelets

WEISBERG • Applied Linear Regression, Second Edition

WESTFALL and YOUNG • Resampling-Based Multiple Testing: Examples and Methods for p−Value Adjustment

WHITTLE • Systems in Stochastic Equilibrium

WOODING • Planning Pharmaceutical Clinical Trials: Basic Statistical Principles

WOOLSON • Statistical Methods for the Analysis of Biomedical Data

∗ZELLNER • An Introduction to Bayesian Inference in Econometrics

Texts and References Section

AGRESTI • An Introduction to Categorical Data Analysis

ANDERSON • An Introduction to Multivariate Statistical Analysis, Second Edition

ANDERSON and LOYNES • The Teaching of Practical Statistics

ARMITAGE and COLTON • Encyclopedia of Biostatistics: Volumes 1 to 6 with Index

BARTOSZYNSKI and NIEWIADOMSKA-BUGAJ • Probability and Statistical Inference

BENDAT and PIERSOL • Random Data: Analysis and Measurement Procedures, Third Edition

BERRY, CHALONER, and GEWEKE • Bayesian Analysis in Statistics and Econometrics: Essays in Honor of Arnold Zellner

BHATTACHARYA and JOHNSON • Statistical Concepts and Methods

BILLINGSLEY • Probability and Measure, Second Edition

BOX • R. A. Fisher, the Life of a Scientist

BOX, HUNTER, and HUNTER • Statistics for Experimenters: An Introduction to Design, Data Analysis, and Model Building

BOX and LUCENO • Statistical Control by Monitoring and Feedback Adjustment

BROWN and HOLLANDER • Statistics: A Biomedical Introduction

CHATTERJEE and PRICE • Regression Analysis by Example, Third Edition

COOK and WEISBERG • Applied Regression Including Computing and Graphics

COOK and WEISBERG • An Introduction to Regression Graphics

COX • A Handbook of Introductory Statistical Methods

DILLON and GOLDSTEIN • Multivariate Analysis: Methods and Applications

∗DODGE and ROMIG • Sampling Inspection Tables, Second Edition

DRAPER and SMITH • Applied Regression Analysis, Third Edition

DUDEWICZ and MISHRA • Modern Mathematical Statistics

DUNN • Basic Statistics: A Primer for the Biomedical Sciences, Second Edition

EVANS, HASTINGS, and PEACOCK • Statistical Distributions, Third Edition

FISHER and VAN BELLE • Biostatistics: A Methodology for the Health Sciences

FREEMAN and SMITH • Aspects of Uncertainty: A Tribute to D. V. Lindley

GROSS and HARRIS • Fundamentals of Queueing Theory, Third Edition

HALD • A History of Probability and Statistics and their Applications Before 1750

HALD • A History of Mathematical Statistics from 1750 to 1930

HELLER • MACSYMA for Statisticians

HOEL • Introduction to Mathematical Statistics, Fifth Edition

HOLLANDER and WOLFE • Nonparametric Statistical Methods, Second Edition

HOSMER and LEMESHOW • Applied Logistic Regression, Second Edition

HOSMER and LEMESHOW • Applied Survival Analysis: Regression Modeling of Time to Event Data

JOHNSON and BALAKRISHNAN • Advances in the Theory and Practice of Statistics: A Volume in Honor of Samuel Kotz

JOHNSON and KOTZ (editors) • Leading Personalities in Statistical Sciences: From the Seventeenth Century to the Present

JUDGE, GRIFFITHS, HILL, LÜTKEPOHL, and LEE • The Theory and Practice of Econometrics, Second Edition

KHURI • Advanced Calculus with Applications in Statistics

KOTZ and JOHNSON (editors) • Encyclopedia of Statistical Sciences: Volumes 1 to 9 wtih Index

KOTZ and JOHNSON (editors) • Encyclopedia of Statistical Sciences: Supplement Volume

KOTZ, REED, and BANKS (editors) • Encyclopedia of Statistical Sciences: Update Volume 1

KOTZ, REED, and BANKS (editors) • Encyclopedia of Statistical Sciences: Update Volume 2

LAMPERTI • Probability: A Survey of the Mathematical Theory, Second Edition

LARSON • Introduction to Probability Theory and Statistical Inference, Third Edition

LE • Applied Categorical Data Analysis

LE • Applied Survival Analysis

MALLOWS • Design, Data, and Analysis by Some Friends of Cuthbert Daniel

MARDIA • The Art of Statistical Science: A Tribute to G. S. Watson

MASON, GUNST, and HESS • Statistical Design and Analysis of Experiments with Applications to Engineering and Science

MURRAY • X-STAT 2.0 Statistical Experimentation, Design Data Analysis, and Nonlinear Optimization

PURI, VILAPLANA, and WERTZ • New Perspectives in Theoretical and Applied Statistics

RENCHER • Linear Models in Statistics

RENCHER • Methods of Multivariate Analysis

RENCHER • Multivariate Statistical Inference with Applications

ROSS • Introduction to Probability and Statistics for Engineers and Scientists

ROHATGI • An Introduction to Probability Theory and Mathematical Statistics

RYAN • Modern Regression Methods

SCHOTT • Matrix Analysis for Statistics

SEARLE • Matrix Algebra Useful for Statistics

STYAN • The Collected Papers of T. W. Anderson: 1943–1985

TIERNEY • LISP-STAT: An Object-Oriented Environment for Statistical Computing and Dynamic Graphics

WONNACOTT and WONNACOTT • Econometrics, Second Edition

WU and HAMADA • Experiments: Planning, Analysis, and Parameter Design Optimization

∗ Now available in a lower priced paperback edition in the Wiley Classics Library.

WILEY SERIES IN PROBABILITY AND STATISTICS

WILEY SERIES IN PROBABILITY AND STATISTICS

ESTABLISHED BY WALTER A. SHEWHART AND SAMUEL S. WILKS

Editors

Robert M. Groves, Graham Kalton, J. N. K. Rao, Norbert Schwarz, Christopher Skinner

Survey Methodology Section

BIEMER, GROVES, LYBERG, MATHIOWETZ, and SUDMAN • Measurement Errors in Surveys

COCHRAN • Sampling Techniques, Third Edition

COUPER, BAKER, BETHLEHEM, CLARK, MARTIN, NICHOLLS, and O’REILLY (editors) • Computer Assisted Survey Information Collection

COX, BINDER, CHINNAPPA, CHRISTIANSON, COLLEDGE, and KOTT (editors) • Business Survey Methods

∗DEMING • Sample Design in Business Research

DILLMAN • Mail and Telephone Surveys: The Total Design Method, Second Edition

GROVES and COUPER • Nonresponse in Household Interview Surveys

GROVES • Survey Errors and Survey Costs

GROVES, BIEMER, LYBERG, MASSEY, NICHOLLS, and WAKSBERG • Telephone Survey Methodology

∗HANSEN, HURWITZ, and MADOW • Sample Survey Methods and Theory, Volume 1: Methods and Applications

∗HANSEN, HURWITZ, and MADOW • Sample Survey Methods and Theory, Volume II: Theory

KISH • Statistical Design for Research

∗KISH • Survey Sampling

KORN and GRAUBARD • Analysis of Health Surveys

LESSLER and KALSBEEK • Nonsampling Error in Surveys

LEVY and LEMESHOW • Sampling of Populations: Methods and Applications, Third Edition

LYBERG, BIEMER, COLLINS, de LEEUW, DIPPO, SCHWARZ, TREWIN (editors) Survey Measurement and Process Quality

SIRKEN, HERRMANN, SCHECHTER, SCHWARZ, TANUR, and TOURANGEAU (editors) • Cognition and Survey Research

∗ Now available in a lower priced paperback edition in the Wiley Classics Library.

Copyright © 2000 by John Wiley & Sons, Inc. All rights reserved.

Published simultaneously in Canada.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4744. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 605 Third Avenue, New York, NY 10158-0012, (212) 850-6011, fax (212) 850-6008, E-mail: [email protected].

Library of Congress Cataloging-in-Publication Data:

Smoothing and regression : approaches, computation, and application / edited by Michael G. Schimek.p. cm.“A Wiley-Interscience Publication.”Includes bibliographical references.ISBN 0-471-17946-9 (alk. paper)1. Smoothing (Statistics) 2. Nonparametric statistics.3. Regression analysis. I. Schimek, Michael G.QA278.S64 2000519.5″36 – dc2199-22017

To my mother and the memory of my father

Contributors

Michel Delecroix, ENSAI, Rennes, France

Randall L. Eubank, Texas A&M University, College Station, Texas, USA

Ludwig Fahrmeir, Ludwig-Maximilians-Universität München, Munich, Germany

Jianqing Fan, University of North Carolina, Chapel Hill, North Carolina, USA and The Chinese University of Hong Kong, Hong Kong, PRC

Irene Gijbels, Université Catholique de Louvain, Louvain-la-Neuve, Belgium

Janet Grassmann, SONEM, Würzburg, Germany

Chong Gu, Purdue University, West Lafayette, Indiana, USA

Wolfgang Härdle, Humboldt-Universität zu Berlin, Berlin, Germany

Eva Herrmann, Technische Universität Darmstadt, Darmstadt, Germany

Sigbert Klinke, Humboldt-Universität zu Berlin, Berlin, Germany

Leonhard Knorr-Held, Ludwig-Maximilians-Universität München, Munich, Germany

Robert Kohn, Australian Graduate School of Management, Sydney, Australia

Thomas T. Kötter, SAP AG, Walldorf, Germany

Angelika van der Linde, Edinburgh University, Edinburgh, UK

Enno Mammen, Ruprecht-Karls-Universität Heidelberg, Heidelberg, Germany

Marlene Müller, Humboldt-Universität zu Berlin, Berlin, Germany

Guy P. Nason, University of Bristol, Bristol, UK

Douglas W. Nychka, National Center for Atmospheric Research, Boulder, Colorado, USA

Pascal Sarda, Université Paul Sabatier, Toulouse, France

Michael G. Schimek, Karl-Franzens-Universität Graz, Graz and Universität Wien, Vienna, Austria

David W. Scott, Rice University, Houston, Texas, USA

Bernard W. Silverman, University of Bristol, Bristol, UK

Jeffrey S. Simonoff, New York University, New York, USA

Michael Smith, University of Sydney, Sydney, Australia

Christine Thomas-Agnan, Université de Toulouse I, Toulouse, France

Berwin A. Turlach, The University of Western Australia, Perth, Australia

Gerhard Tutz, Ludwig-Maximilians-Universität München, Munich, Germany

Philippe Vieu, Université Paul Sabatier, Toulouse, France

Paul Yau, Australian Graduate School of Management, Sydney, Australia

Foreword

It is an honor to be asked to write a foreword for this volume, and it is my pleasure to congratulate the editor and the authors for a job well done.

This book is unique in that it brings together in one place a variety of points of view regarding nonparametric regression, smoothing and statistical modeling. These methods, as the editor notes, are becoming ubiquitous throughout the scientific, economic, environmental and social enterprises as ever increasing computational power and data collection ability become available.

The historical overviews here are important in tying together the different threads of the theory, which are widely scattered in the literature over time and in various journals. Many of the papers bring the reader up to date on the state of the art in the approaches discussed. Much useful computational and practical information, for example, discussion of useful software, is provided.

As I am sure the editor and authors will agree, this is a fun area to work in. There are many interesting and elegant theoretical questions, many interesting numerical and computational questions, and, finally, many interesting areas of application, where the statistician has the opportunity and satisfaction of working on “real world” scientific problems and contributing to the study of important scientific questions in a variety of areas.

The reader will probably agree with me that there is no one uniquely best method for all situations, but that substantive estimation and modeling problems may require the examination of several different approaches. Nevertheless there is abundant theory to give us guidance under various assumptions, although the last word has not been said on theoretical developments. This is particularly true in the multivariate case where data with very complex structure, as in environmental and demographic medical studies, is collected.

This book provides an important addition to the reading list of a modern nonparametric regression course. The extensive reference lists provide an excellent historical perspective, and a prime starting point for new researchers in the field. It will be a handy reference source for people wishing to obtain an overview of the various techniques, and pointers to practical issues in implementing the various methods. In addition, it is a fine summary of the state of the art as of the creation of the book.

Grace WahbaMadison, December 1999

Preface

The idea for this volume can be traced back to the COMPSTAT Satellite Meeting on Smoothing, which took place in August 1994 in Semmering, Austria. I had the pleasure of organizing. During this workshop, it became clear that smoothing techniques are still a driving force in the development of nonparametric regression techniques. On the other hand they have become so diversified that it is really hard to keep up with all of the developments in modern regression analysis. In going from univariate to multivariate problems, the difference between competing methods is even more pronounced. During the past few years, important new approaches have emerged, such as the marginal integration method for the estimation of additive models. Markov chain Monte Carlo techniques have taken up a leading role in the Bayesian modeling of complicated data structures (dynamic, temporal, or spatial). Wavelets have become an important issue. Semiparametric regression is attracting more and more interest, and sliced inverse regression is offering new perspectives. Furthermore, all of these fascinating developments have been going on during the planning and writing of this book. We incorporated as much information on them as possible into this volume. References were last updated in late 1999.

What was the motivation to start an enterprise like this? There is certainly no lack of monographs in the field of smoothing techniques and nonparametric regression. Most of them are highly technical, some of them emphasize computational aspects or software products, and others embark on specific applications, but what they all have in common is that they specialize in one approach, say spline smoothing or local polynomial fitting. This volume attempts to bridge the gaps between these contributions in describing and occasionally comparing competing univariate and multivariate smoothing techniques. In addition, some of the subjects discussed in this book have not yet been covered in textbooks or monographs, such as smoothing under shape restrictions or of dependent data, or resampling methods for nonparametric regression, or vector generalized additive models.

What all of these topics have in common is that they are computationally demanding, some even requiring dynamic high-resolution graphics. Hence, in addition to defining and explaining them, it is only natural to discuss their numerical and software-related aspects. However, without the possibility of an appropriate implementation, such theoretical developments appear to be of limited value. Thus, in addition to explanations of their formal statistical background, all methods are discussed in the light of both computational efficiency and applicability for the analysis of real data. Smoothing techniques in regression are increasingly applied in the biosciences, in the environmental sciences, and in the fields of engineering and economics. They are also about to become relevant for many other fields, such as medical research and marketing. Last but not least, great efforts have been made in presenting graphics, which are of the utmost importance for any exploratory data analysis and the visualization of high-dimensional problems. Hence, also color plates were produced for this volume.

As a consequence, each regression approach included in the book is discussed from the following points of view: background theory, problems of computation and implementation, and examples of practical application.

The level of presentation is approximately the same throughout the volume, although some methods differ from others in terms of necessary technicalities, which are kept to a minimum to the greatest possible extent. Hence, the book addresses a wide audience: graduate students of statistics, applied mathematics, engineering, computer science and econometrics; scholars in the aforementioned fields; and applied researchers in these and other areas, such as biosciences, environmental sciences, medicine, psychometrics, and marketing. The wide range of examples in the volume is intended to further diversify the scope of application of nonparametric and semiparametric regression models and related techniques.

The emphasis of the volume is on multivariate regression problems. However, there are also introductory chapters that guide the reader through the most important univariate approaches, especially those of smoothing splines and of kernels. Other aspects of great importance are smoothing parameter choice, bandwidth selection, variance estimation, smoothing under functional constraints, and smoothing of autocorrelated data. A further topic of current interest is wavelet methods for regression. Some of these topics are essential for the understanding of multivariate regression problems.

The motivation behind the chapters on multivariate regression problems is to provide a state of-the-art introduction to the large number of different approaches that are already established or are promising for the future of practical data analysis. Among these approaches are smoothing methods for discrete data (e.g., from contingency tables), local polynomial fitting, additive and generalized additive models (including alternating conditional expectations and additivity and variance stabilization), multivariate spline regression (e.g., thin-plate splines and interaction splines), multivariate and semiparametric kernel regression, spatial-process estimates as smoothers, resampling methods for nonparametric regression, multidimensional smoothing and visualization, projection pursuit regression, sliced inverse regression, nonparametric Bayesian dynamic and semiparametric models, and, finally, nonparametric Bayesian surface estimation. Competing algorithms are discussed throughout the book and are followed by software hints and examples using simulated as well as real data.

Due to the vast number of different techniques to be covered, the preparation of this volume demanded that experts in these very areas work together. As the book’s editor, I designed it that way from the very beginning. To achieve this ambitious goal, I brought together researchers from many different countries and schools of thought, most of who would otherwise not be likely to write a book together. All agreed with me on the point that this project is solely feasible when there is a maximum of exchange among them. So authors commented on each others contributions and the final product is a unique piece of work with the same notation throughout. A crucial requirement was also that all authors obey a certain style of presentation and restrict themselves to a prespecified level of mathematical argumentation, avoiding the popular theorem–proof structure of statistics monographs. Proofs are either cited or given in an intuitive manner. The extensive list of references following each chapter should allow readers who desire further details, to trace down all of these technicalities. The specific format and the comprehensiveness of this volume are exactly what makes it special in the field of smoothing and regression.

I am most grateful to the enthusiastic support of all of the people who have helped to shape the book as it now stands. In addition to the authors and Stephen H. Quigley from John Wiley & Sons, Inc., there are many others I have to thank for stimulating discussions and for their most valuable reviews of chapters. In particular, I would like to name Dennis R. Cook, Dennis D. Cox, Naihua Duan, Sylvia Frühwirth–Schnatter, Peter Hall, Jeffrey D. Hart, M. C. Jones, James S. Marron, Jean D. Opsomer, Jürgen Pilz, James O. Ramsay, Burghart Seifert, Neil Shepherd, Stefan Sperlich, Joan G. Staniswalis, Alexander Tsybakov, Matthew P. Wand, and Thomas W. Yee.

Further, I wish to mention two of my former academic teachers, Prof. Leopold Schmetterer at the University of Vienna, Austria, who first introduced me to nonparametric concepts in statistics, and Prof. Bernard W. Silverman at the University of Bath, UK, who confronted me with smoothing techniques and computational statistics. Since my encounters with these individuals, I have been engaged in related problems, with a special emphasis on computing and applications in the biosciences and medicine.

Last but not least, I want to express my sincerest thanks to my mother, Ingrid, and my father, the late Prof. Herbert Toni Schimek, for their continuing support during my academic education. My father, who was a well-known Austrian designer and artist, not only shaped my eye for graphical detail but also stimulated my interest in science and technology. Both aspects turned out to be relevant to my later academic interests, as partly revealed in this book. Another, more direct contribution of my father is the exibris shown on page xvi, printed from a copperplate he engraved for me in 1979.

Michael G. SchimekGraz, November 1999

CHAPTER 1

Spline Regression

Randall L. Eubank

Texas A&M University, College Station, Texas, USA

1.1 INTRODUCTION

Penalized least-squares regression and spline smoothing provide flexible data-fitting methodologies that have gained in popularity in recent years. While the basic spline-smoothing concept traces its origin back to Whittaker (1923), much of the modern development of smoothing splines and their variants is due to Grace Wahba. An introduction to her work, along with references to her more important papers, can be found in the Wahba (1990) monograph. Here, we give an overview of the work of Wahba and others as applied to the problem of univariate, nonparametric regression analysis.

Let us begin with an examination of the motivation behind spline smoothing. For this purpose, suppose that responses y1, …, yn have been observed at (non-stochastic) design points t1 < · · · < tn following the regression model

(1.1)

where f(·) is an unknown regression function and ∊1, …, ∊n are zero-mean, uncorrected random errors. The problem to be considered is the estimation of f from the observed data.

To understand why linear regression fails in certain cases, it is useful to reconsider the Taylor-expansion argument that is often advanced as a motivation for the method. The basic premise of this argument is that if f has two derivatives, then for a point t that is close to some fixed point t0, we have

as a result of Taylor’s Theorem. Thus, for t close to t0, f follows an approximate linear model with intercept f(t0) − f’(t0)t0 and slope f’(t0). A global (i.e., over the entire design) linear approximation for f will therefore be satisfactory only when the slope of the regression function, f’(t0), is essentially constant and the error term o(t − t0|2) is uniformly small. The condition for a constant slope is clearly violated in Figure 1.1, which explains why simple linear regression has difficulties in fitting this data set. It now seems clear that to successfully fit data in general, we must alter our original estimation prescription and consider minimization of RSS(g) over functions g with slopes that vary over the design.

FIGURE 1.1 Linear regression and interpolation for a simulated data set.

The two “extreme” fits to the data in Figure 1.1 suggest that we need some compromise between fits with constant slopes and fits with completely flexible slopes. One way to accomplish this is to penalize functions whose slopes vary too rapidly. The rate of change in the slope of a function g is given by g″, and hence an overall measure of the change in slope of a potential fitted function is provided by

Thus, a modified version of our original estimation criterion that incorporates a penalty for rapid changes in slope is

(1.2)

If n ≥ 2, then it can be shown that there is a unique, computable minimizer fλ for criterion (1.2), called a cubic smoothing spline. The specific form of fλ and the reason for this particular name for the estimator are discussed more fully in the next section. For now, it suffices to say that the estimator is linear in the sense that for each estimation point t there are computable constants {hi(t)}ni=1 such that

While use of the cubic smoothing-spline estimator may solve the problem of allowing for fits with variable slope, it also creates a new problem, namely, the determination of an appropriate value of the smoothing parameter for a given set of data. We would not expect the same value of λ to work for every data set. Indeed, if we agree that the “best” choice of λ is a value that minimizes the squared-error loss

then it is clear that a good choice′ of the smoothing parameter depends both on the unknown true regression curve as well as the inherent variability of the estimator. Since L(λ) is unknown, we must estimate it in order to produce a data-driven choice for λ. One such estimator can be obtained from the generalized cross-validation (GCV) criterion of Craven and Wahba (1979), defined by

which provides an estimator of L(λ) + σ2. It has been shown by Li (1986) that, under certain conditions, the minimizer of the GCV criterion satisfies 1 as n → ∞. This indicates that, at least in terms of the estimator’s loss performance, one does as well asymptotically with the GCV choice of λ as could be done with the unknown, best possible value of the smoothing parameter.

FIGURE 1.2 GCV and loss for a simulated data set.

FIGURE 1.3 Smoothing spline fit to a simulated data set.

In the next section, we discuss the general form and computation of a smoothing-spline estimator. Section 1.3 then focuses on a simple, special case wherein it is easy to demonstrate the connection between smoothing splines and other nonparametric smoothers, such as Fourier series and kernel estimators. The large-sample frequentist properties of smoothing splines are discussed in Section 1.4 and Bayesian properties of the estimators and their frequentist implications are the subject of Section 1.5. In Section 1.6, we conclude with a discussion of how the penalized least-squares estimation paradigm that produced smoothing splines can be generalized to provide estimators for other problems.

1.2 GENERAL FORM OF THE ESTIMATOR

Our basic smoothing criterion, described in Equation (1.2), can be extended in several possible ways to produce other useful estimators of f. More generally, we could consider the estimation of f by a 2mth-order smoothing spline fλ obtained by minimizing

(1.3)

where w1, …, wn are positive weights and m is a positive integer. Equation (1.3) is minimized over all functions g having m − 1 absolutely continuous derivatives and square integrable mth derivatives. The weights w1, w2, …, wn in (1.3) can be used to place additional emphasis on the residuals from fitting the data in some important region of the design space, to adjust for heteroskedastic errors, or to produce an estimator when there are replicate samples (see, e.g. Eubank, 1988, 207ff).

(1.4)

To select a good value of λ from the data, one can again use the GCV criterion, this time with the fλ obtained from Equation (1.4). The residual sum of squares in our earlier form of the criterion is now replaced with the weighted residual sum of squares from Equation (1.3), and the values of hi(ti) are taken to be the diagonal elements of the matrix X(XTWX + λΩ)−1XTW. Efficient, O(n) algorithms for computing and, hence, the GCV criterion can be found in Hutchinson and de Hoog (1985, 1987).

FIGURE 1.4 Smoothing spline fit to assay data.

1.3 THE LINEAR SMOOTHING SPLINE

A difficulty in understanding how smoothing splines process data stems from the absence of an easily interpretable closed form for the estimator. There is, however, one special case where (1.4) can be solved explicitly to provide an estimator that admits simple interpretations. This case and its consequences are the subject of this section.

(1.5)

where is the response average,

(1.6)

is the jth sample cosine Fourier coefficient, and

(1.7)

The oscillation properties of the functions cos(jπt) for increasing frequency j, along with the orthogonality properties

and

The interpretation of the linear smoothing spline as a type of damped Fourier-series estimator can be extended to general smoothing splines through an appropriate choice of the natural-spline basis functions. It has been shown by Demmler and Reinsch (1975) that there is a natural-spline basis with oscillation properties similar to those for the cosine functions under which a smoothing spline admits a representation similar to that for the linear case. In fact, (1.6)–(1.7) are actually just a special case of the general Demmler–Reinsch representation.

Returning again to the linear smoothing spline, we see that by rearranging terms in (1.7) we can write , where

If we approximate γj by (jπ)2 in this expression and replace the sum by an integral, we can argue heuristically that

(1.8)

It is actually not difficult to make the argument leading to Equation (1.8) precise. One first uses the MacLaurin series for sin x to obtain approximation bounds, such as 0.4jπ ≤ 2n sin(jπ/(2n)) ≤ jπ and |2n sin(jπ/(2n)) − jπ| ≤ (jπ)3/(24n2) for 1 ≤ j ≤ n − 1, and then combines these bounds with identity 1.445.2 on page 40 of Gradshteyn and Ryzhik (1980).

The connection between spline smoothing and kernel smoothing is not unique to the linear case. If we think of the ti as being n -tiles of some continuous density p, then the general smoothing spline obtained from (1.3) admits a similar kernel approximation, with bandwidth [λ/p(·)]1/2m and kernel K, obtained as the solution of a particular 2mth-order differential equation (see Messer, 1991; Nychka, 1995; and Silverman, 1984). The kernel is of order 2m in the sense that and . However, this approximation is valid only in the interior of [t1, tn], and the approximate kernel degrades to order m near t1 and tn.

1.4 LARGE-SAMPLE EFFICIENCY

In this section, we discuss the large-sample properties of spline smoothers. Perhaps the best or strongest asymptotic results for smoothing splines can be found in ox (1983), Rice and Rosenblatt (1983), and Speckman (1985). However, much of this work again traces its origins to seminal work by Grace Wahba, such as Wahba (1975).

The basic premise behind a large portion of the asymptotic analysis conducted in nonparametric regression involves assessment of an estimator’s large-sample efficiency in terms of some type of performance measure. A standard choice for a performance criterion is the average squared-error risk, which has the form

is the variance part of the risk and

is the average squared bias. Arguments in Speckman (1985) can be used to show that for large n and small λ the variance term V(λ) is well approximated by C/(nλ1/2m), where C is a known constant that depends only on m and the design, and that (see also Eubank, 1988, Section 6.3.2). These equations give a large-sample upper bound for R(λ) of the form

from which we see that any choice of λ decaying at the rate n−2m/(2m+1) will ensure an optimal rate of decay for the smoothing-spline risk.

More exact expressions for the risk require an in-depth analysis of the properties of B2(λ), rather than just an upper bound on its size. This analysis tends to be somewhat complicated by its dependence on the boundary behavior of the regression function. To be somewhat more precise, it turns out that B2(λ) can decay to zero at a rate substantially faster than order λ if f satisfies some subset (or all) of the natural boundary conditions

(1.9)

that are necessarily satisfied by the natural-spline basis functions used to construct fλ. Arguments by Rice and Rosenblatt (1983) and Utreras (1987) have the consequence that if f has 2m derivatives and satisfies

Returning to the kernel approximation for fλ discussed in Section 1.3, we can observe that the approximate form C/(nλ1/2m) for V(λ) is exactly what we would expect from a kernel estimator of order 2m with bandwidth λ1/2m. However, even if f has 2m derivatives, the bias term B2(λ) decays at a rate of only O(λ), rather than at the O(λ2) rate we would expect from a 2mth-order kernel estimator, unless the natural boundary conditions are satisfied. The reason for this is that while smoothing splines behave like 2mth-order kernel estimators in the interior of [t1, tn], their parallel of the kernel estimator boundary correction is only of order m. Thus, for regression functions with more than the expected m derivatives that do not satisfy (1.9), the bias for fλ is larger near t1 and tn than in the interior of the estimation interval. This entails that the bias part of the risk and, hence, the optimal level of smoothing tend to be dominated by the boundary behavior of f in such cases. From a data-analytic standpoint, this tendency has the consequence that, at least to the extent that the GCV estimator of λ estimates the minimizer of R(λ), information from the boundary region may have an inordinately large influence on data-driven choices for the smoothing parameter. To compensate for this effect, the estimator may be boundary corrected along the lines of proposals in Eubank and Speckman (1991) and Oehlert (1992).

provided that f has a Hölder-continuous second derivative, and

These expressions are valid for t in any fixed interval that does not contain the boundary. They clearly show that the linear smoothing spline behaves like a second-order kernel smoother for estimation in the interior of [t1, tn].

1.5 BAYESIAN MOTIVATION

The form of the smoothing-spline estimator that follows from (1.4) often leads parametric regression analysts to remark about possible connections between spline smoothing and ridge regression. One reason for the formal similarity of these two estimators derives from their common motivation as posterior means for certain Bayesian regression models. We explore the connection between Bayesian regresssion and spline smoothing in this section.

There are several elegant Bayesian regression models for which a smoothing spline is found to be an optimal estimator of the regression function (see, e.g. Kohn and Ansley, 1988; Wahba, 1978; Shiller, 1984; and van der Linde, 1993). Here, we deal only with the simple Bayesian model of Silverman (1985) and suppose that instead of using model (1.1), we collected our data from

The connection between Bayesian regression and spline smoothing has considerable utility, since it provides a justification for the use of diagnostic and interval estimation techniques that parallel those used in frequentist, normal-theory linear regression. For example, it is easy to show that, under our Bayesian model, the ith residual yi – fλ(ti) has mean zero and variance σ2(1 − hi(ti)), just as in ordinary least-squares regression. This suggests that one might conduct residual analysis of a smoothing-spline fit by following a linear-regression-type paradigm involving the examination of “standardized” residuals , and other related measures. It can actually be shown (see Craven and Wahba, 1979) that, for a smoothing-spline estimator,

where is the smoothing-spline fit at ti that is computed from the data set, with the observation for (ti, yi) deleted. This result can then be combined with the Bayesian residual variance identity to produce various types of deleted-observation influence diagnostics. Examples of these diagnostics are discussed in Eubank (1988, Section 5.5.5).

(1.10)

where Zα/2 is the 100(1 − α/2)th percentage point of the standard normal distribution. Thus, (1.10) provides one way of producing interval estimators to accompany the smoothing-spline point estimator.

The derivation of the intervals (1.10) from a Bayesian perspective has prompted a number of investigations into the empirical and theoretical frequentist properties of these intervals under model (1.1). One of the most complete studies of this nature is that of Nychka (1988), who has shown that, under certain restrictions and with an optimally chosen level of smoothing, we have

(1.11)

This states that under model (1.1) the average coverage for the Bayesian intervals across the design should be approximately the same as their nominal, pointwise coverage. While this condition is encouraging, it tells us very little about the pointwise coverage properties of any particular interval, since these properties can be unacceptably lower or higher than the nominal level and still satisfy (1.11).