126,99 €
Written by the leading expert in the field, this text reviews the major new developments in envelope models and methods An Introduction to Envelopes provides an overview of the theory and methods of envelopes, a class of procedures for increasing efficiency in multivariate analyses without altering traditional objectives. The author offers a balance between foundations and methodology by integrating illustrative examples that show how envelopes can be used in practice. He discusses how to use envelopes to target selected coefficients and explores predictor envelopes and their connection with partial least squares regression. The book reveals the potential for envelope methodology to improve estimation of a multivariate mean. The text also includes information on how envelopes can be used in generalized linear models, regressions with a matrix-valued response, and reviews work on sparse and Bayesian response envelopes. In addition, the text explores relationships between envelopes and other dimension reduction methods, including canonical correlations, reduced-rank regression, supervised singular value decomposition, sufficient dimension reduction, principal components, and principal fitted components. This important resource: * Offers a text written by the leading expert in this field * Describes groundbreaking work that puts the focus on this burgeoning area of study * Covers the important new developments in the field and highlights the most important directions * Discusses the underlying mathematics and linear algebra * Includes an online companion site with both R and Matlab support Written for researchers and graduate students in multivariate analysis and dimension reduction, as well as practitioners interested in statistical methodology, An Introduction to Envelopes offers the first book on the theory and methods of envelopes.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 486
Veröffentlichungsjahr: 2018
Cover
Dedication
Preface
Outline
Computing
Acknowledgments
Notation and Definitions
Chapter 1: Response Envelopes
1.1 The Multivariate Linear Model
1.2 Envelope Model for Response Reduction
1.3 Illustrations
1.4 More on the Envelope Model
1.5 Maximum Likelihood Estimation
1.6 Asymptotic Distributions
1.7 Fitted Values and Predictions
1.8 Testing the Responses
1.9 Nonnormal Errors
1.10 Selecting the Envelope Dimension,
Proposition 1.11 Bootstrap and Uncertainty in the Envelope Dimension
Chapter 2: Illustrative Analyses Using Response Envelopes
2.1 Wheat Protein: Full Data
2.2 Berkeley Guidance Study
2.3 Banknotes
2.4 Egyptian Skulls
2.5 Australian Institute of Sport: Response Envelopes
2.6 Air Pollution
2.7 Multivariate Bioassay
2.8 Brain Volumes
2.9 Reducing Lead Levels in Children
Chapter 3: Partial Response Envelopes
3.1 Partial Envelope Model
3.2 Estimation
3.3 Illustrations
3.4 Partial Envelopes for Prediction
3.5 Reducing Part of the Response
Chapter 4: Predictor Envelopes
4.1 Model Formulations
4.2 SIMPLS
4.3 Likelihood‐Based Predictor Envelopes
4.4 Illustrations
4.5 Simultaneous Predictor–Response Envelopes
Chapter 5: Enveloping Multivariate Means
5.1 Enveloping a Single Mean
5.2 Enveloping Multiple Means with Heteroscedastic Errors
5.3 Extension to Heteroscedastic Regressions
Chapter 6: Envelope Algorithms
6.1 Likelihood‐Based Envelope Estimation
6.2 Starting Values
6.3 A Non‐Grassmann Algorithm for Estimating
6.4 Sequential Likelihood‐Based Envelope Estimation
6.5 Sequential Moment‐Based Envelope Estimation
Chapter 7: Envelope Extensions
7.1 Envelopes for Vector‐Valued Parameters
7.2 Envelopes for Matrix‐Valued Parameters
7.3 Envelopes for Matrix‐Valued Responses
7.4 Spatial Envelopes
7.5 Sparse Response Envelopes
7.6 Bayesian Response Envelopes
Chapter 8: Inner and Scaled Envelopes
8.1 Inner Envelopes
8.2 Scaled Response Envelopes
8.3 Scaled Predictor Envelopes
Chapter 9: Connections and Adaptations
9.1 Canonical Correlations
9.2 Reduced‐Rank Regression
9.3 Supervised Singular Value Decomposition
9.4 Sufficient Dimension Reduction
9.5 Sliced Inverse Regression
9.6 Dimension Reduction for the Conditional Mean
9.7 Functional Envelopes for SDR
9.8 Comparing Covariance Matrices
9.9 Principal Components
9.10 Principal Fitted Components
Appendix A: Envelope Algebra
A.1 Invariant and Reducing Subspaces
A.2
‐Envelopes
A.3 Relationships Between Envelopes
A.4 Kronecker Products, vec and vech
A.5 Commutation, Expansion, and Contraction Matrices
A.6 Derivatives
A.7 Miscellaneous Results
A.8 Matrix Normal Distribution
A.9 Literature Notes
Appendix B: roofs for Envelope Algorithms
B.1 The 1D Algorithm
B.2 Sequential Moment‐Based Algorithm
Appendix C: Grassmann Manifold Optimization
C.1 Gradient Algorithm
C.2 Construction of
C.3 Construction of
C.4 Starting and Stopping
Bibliography
Author Index
Subject Index
End User License Agreement
Chapter 1
Table 1.1 Cattle data: Estimated coefficients for the standard model and envelope model with
.
Table 1.2 Wheat protein data: bootstrap and asymptotic standard errors (SEs) of the six elements in
under the standard and envelope models for the wheat protein data with six responses.
Table 1.3 Cattle data: distributions of
from BIC based on various sampling scenarios.
Table 1.4 Cattle data: ratios of standard errors of the elements of
to the bootstrap standard errors from the weighted envelope fit and the envelope fits for
.
Chapter 2
Table 2.1 Wheat protein data: Coefficient estimates and their asymptotic standard errors from the envelope model with
and the standard model fitted to the complete data.
.
Table 2.2 Berkeley guidance study: Bootstrap and estimated asymptotic standard errors of the coefficient estimates under the standard model (SM) and envelope model (EM) for two bivariate regressions from the Berkeley data. BSM and BEM designate the bootstrap standard errors for the standard and envelope models based on 200 bootstrap samples.
Table 2.3 Banknote data: Envelope
and ordinary least squares
coefficient estimates.
Table 2.4 Egyptian skull data:
and standard error ratios
.
Table 2.5 Egyptian skull data: Coefficients and absolute
‐scores in parentheses.
Table 2.6 Egyptian skull data: Estimated covariance matrices from fits of the multivariate linear model and envelope model with
.
Table 2.7 Australian Institute of Sport:
and coefficient estimates with standard errors in parentheses. The envelope fit is with
.
Table 2.8 Ozone data: Estimated coefficients from fits of the standard multivariate linear model (1.1) and envelope model (1.20) with
.
Table 2.9 Ozone data: Standard error ratios, standard errors from the standard model fit divided by the corresponding standard errors from the envelope model fit with
.
Table 2.10 Ozone data: Estimated coefficients from separate envelope fits with
of
on
and
on
.
Table 2.11 Ozone data: Estimated residual covariance matrices
and
from fits of the multivariate linear model and envelope model with
.
Chapter 3
Table 3.1 Cattle data: Estimated coefficients for the partial envelope model with
.
Table 3.2 Mens' urine: Ratios of coefficient estimates to their standard errors (
‐scores) for the ordinary least squares (OLS) fit and partial envelope fit with
.
Table 3.3 Pulp fibers: Fitted and predicted values at
.
Chapter 4
Table 4.1 Expository data: The absolute
‐scores, estimates divided by their asymptotic standard errors, for the coefficients estimated by using ordinary least squares (OLS) and the envelope method (EM).
.
Table 4.2 Australian Institute of Sport: coefficient estimates and their standard errors from the envelope model with
and the standard model.
Table 4.3 Wheat protein data: coefficient estimates and their standard errors from the envelope model with
, the standard model and SIMPLS.
Table 4.4 Mussels' muscles: fits of the standard model and envelope model with
.
Table 4.5 Mussels' muscles: estimated covariance matrices from the standard and envelope (
) fits.
Chapter 5
Table 5.1 Minneapolis schools: Envelope analysis of the square root of the percent of sixth grade students scoring above
and below
average.
Table 5.2 Minneapolis schools: Estimates of
determined by using likelihood ratio tests sequentially for various test levels
on the Minneapolis school data with four untransformed responses.
Table 5.3 Minneapolis schools: Summary of an envelope analysis with
of the percentage of fourth and sixth grade students scoring above and below average. Column headings are as in Table 5.1.
Chapter 8
Table 8.1 Race times: Absolute
‐scores (columns
–
) and the estimated coefficients
for
(column
) from the fit of the inner response envelope model with
.
Table 8.2 Race times: Estimated scales
, absolute
‐scores for the coefficients of
, and estimated coefficients
for
from the fit of the scaled response envelope with
.
Chapter 1
Figure 1.1 Schematic representation of an envelope.
Figure 1.2 Graphical illustration of a relatively uncomplicated scenario. The axes are centered responses.
Figure 1.3 Graphical illustration envelope estimation. (a) Standard analysis; (b) envelope analysis.
Figure 1.4 Wheat protein data with the estimated envelope superimposed.
Figure 1.5 Cattle data: profile plots of weight for 60 cows over 10 time periods. Colors designate the two treatments. (a) Profile plots for individual cows by treatment. (b) Profile plots of average weight by treatment.
Figure 1.6 Cattle data: profile plot of fitted weights from the envelope model with
. Colors designate the two treatments.
Figure 1.7 Cattle data: scatterplot of week 12 weight versus week 14 weight with treatments marked.
Figure 1.8 Cattle data: simulation results based on the fit of the cattle data with
. The horizontal axis denotes the simulation sample size
, where
is the sample size for the original data.
Figure 1.9 Illustrations of the potential bias that can result from underestimating
. The context is the same as that for Figure 1.3,
and
denote the eigenvectors of
and
. (a) Small bias; (b) large bias.
Figure 1.10 Cattle data: fitted profile plots from envelope models with (a)
and (b)
.
Figure 1.11 Cattle data: mean of immaterial variation
for each treatment and time with
.
Chapter 2
Figure 2.1 Berkeley guidance study: Envelope construction for the regression of height at ages 13 and 14, plotted on the horizontal and vertical axes, on a gender indicator
. Blue triangles represent males and red circles represent females. The lines marked by
and
denote the estimated envelope and its orthogonal complement.
Figure 2.2 Berkeley guidance study: Envelope construction for the regression of height at ages 17 and 18, plotted on the horizontal and vertical axes, on a gender indicator
. Plot symbols are as in Figure 2.1.
Figure 2.3 Banknote data: Plot of the first two envelope variates marked by status. Black
– genuine. red
– counterfeit.
Figure 2.4 Egyptian skull data: Boxplots of
versus year.
Figure 2.5 Australian Institute of Sport: Envelope construction for the regression of
on gender
. Blue axes represent females, red circles represent males, and the two black dots represent the gender means. The lines marked by
and
denote the estimated envelope and its orthogonal complement.
Figure 2.6 Ozone data: Added variable plots for
and
from the fit of
on
. The lines on the plots represent the linear ordinary least squares fits of the vertical axis variable on the horizontal axis variable. (a) AVP for
. (b) AVP for
.
Figure 2.7 Ozone data: Scatterplot illustrating the immaterial variation.
Figure 2.8 Multivariate bioassay: Scatterplot matrix of the six responses in the rabbit assay. The subscripts on
indicate the hour at which the response was measured.
Figure 2.9 Multivariate bioassay: Scatterplot of
versus
and
from an envelope analysis of the rabbit assay using model .
Figure 2.10 Brain volumes: Scatterplot of the canonical variate for
versus age from an envelope analysis with
.
Figure 2.11 Blood lead levels in children: An ex represents the chelating agent, and an open circle represents the placebo. (a) Week zero versus week one lead levels. (b) Week four versus week six lead levels.
Chapter 3
Figure 3.1 Schematic representation of a partial envelope with three possibilities for
.
Figure 3.2 Mens' urine: Added variable plot for
. The line is the ordinary least squares fit of the plotted data.
Chapter 4
Figure 4.1 Australian Institute of Sport: scatterplot of the predictors hematocrit and hemoglobin, along with the estimated envelope and
from the regression with red cell count as the response.
Figure 4.2 Wheat protein data: scatterplot of the predictors at the third and fourth wavelength, along with the estimated envelope and
from the regression with protein content as the response.
Figure 4.3 Mussels' muscles: scatterplot of the response
and the four predictors. Three relatively outlying points are indicted with +.
Chapter 5
Figure 5.1 Graphical illustration of enveloping a population mean.
Figure 5.2 Minneapolis schools:
is the square root of the percentage of sixth graders scoring above average and
is the square root of the percentage of sixth graders scoring below average. The ellipses are contours of
with its eigenvectors shown, and the red ex marks the mean of the data.
Figure 5.3 Minneapolis school data.
Figure 5.4 Schematic representation of heteroscedastic enveloping with projection paths A and B. The envelope and its orthogonal complement are represented by solid and dashed diagonal lines.
Figure 5.5 Cattle data: Profile plot of the fitted mean weights from a heteroscedastic envelope with
.
Chapter 7
Figure 7.1 Graphical structure of the aster model for simulated data. The top
layer corresponds to survival; these random variables are Bernoulli. The middle
layer corresponds to whether or not an individual reproduced; these random variables are also Bernoulli. The bottom
layer corresponds to offspring count; these random variables are zero‐truncated Poisson.
Figure 7.2 Contour plots of the ratios of the bootstrapped standard errors for the maximum likelihood estimator to the bootstrapped standard errors for the corresponding envelope estimator. The point in blue in the top plot corresponds to the highest estimated expected Darwinian fitness value, which is essentially the same using the envelope estimator and the maximum likelihood estimator.
Figure 7.3 Cattle data: Sparse fitted weights with
.
Chapter 8
Figure 8.1 Schematic representation of an inner envelope.
Figure 8.2 Race times: Profile plot of 10 split times for 80 participants in a 100 km race. (a) Profile plots for individual participants, (b) Profile plots of average split times plus and minus one standard deviation.
Figure 8.3 Fitted race times.
Figure 8.4 Schematic illustration of how rescaling the response can affect an envelope analysis. (a) Original distributions, (b) Rescaled distributions.
Figure 8.5 Schematic illustration of how rescaling the predictors can affect an envelope analysis (adapted from Cook and Su 2016). (a) Original predictor distribution, (b) Rescaled distribution.
Chapter 9
Figure 9.1 Mussels muscles: SIR summary plot of
versus the estimated reduction of
,
.
Figure 9.2 Plots of the first two CORE predictors
and
. (a) Banknote data:
, genuine;
, counterfeit. (b) Birds–planes–cars data: (b) Blue, birds; black, planes; red, cars.
Figure 9.3 Schematic representation showing why principal components work in isotropic models. (a) Circular contour of
. (b) Elliptical contours of
.
Figure 9.4 Plots of the first two principal components with
observations, varying number of variables
and unbounded signal. Each plot was constructed from one simulated dataset. (a)
. (b)
. (c)
. (d)
.
Figure 9.5 Plots of the first two principal components with
observations, varying number of predictors
and bounded signal. Each plot was constructed from one simulated dataset. (a)
. (b)
. (c)
. (d)
.
Cover
Table of Contents
Begin Reading
C1
3
4
5
xv
xvi
xvii
xix
xx
xxi
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
69
70
71
72
73
74
75
76
77
78
79
80
81
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
257
258
259
260
261
262
263
264
265
266
267
267
268
269
270
271
272
273
273
283
284
285
286
287
288
289
290
291
292
293
E1
Established by
Walter A. Shewhart
and
Samuel S. Wilks
Editors:
David J. Balding, Noel A. C. Cressie
,
Garrett M. Fitzmaurice,
Geof H. Givens, Harvey Goldstein, Geert Molenberghs, David W. Scott
,
Adrian F. M. Smith, Ruey S. Tsay
Editors Emeriti:
J. Stuart Hunter, Iain M. Johnstone, Joseph B. Kadane,
Jozef L. Teugels
The
Wiley Series in Probability and Statistics
is well established and authoritative. It covers many topics of current research interest in both pure and applied statistics and probability theory. Written by leading statisticians and institutions, the titles span both state‐of‐the‐art developments in the field and classical methods.
Reflecting the wide range of current research in statistics, the series encompasses applied, methodological and theoretical statistics, ranging from applications and new techniques made possible by advances in computerized practice to rigorous treatment of theoretical approaches.
This series provides essential and invaluable reading for all statisticians, whether in academia, industry, government, or research.
A complete list of titles in this series can be found at
http://www.wiley.com/go/wsps
R. Dennis Cook
School of StatisticsUniversity of MinnesotaU.S.A.
This edition first published 2018
© 2018 John Wiley & Sons, Inc
Edition History
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.
The right of R. Dennis Cook be identified as the author of the material in this work has been asserted in accordance with law.
Registered Office(s)
John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA
Editorial Office
111 River Street, Hoboken, NJ 07030, USA
For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.
Wiley also publishes its books in a variety of electronic formats and by print‐on‐demand. Some content that appears in standard print versions of this book may not be available in other formats.
Limit of Liability/Disclaimer of Warranty
In view of ongoing research, equipment modifications, changes in governmental regulations, and the constant flow of information relating to the use of experimental reagents, equipment, and devices, the reader is urged to review and evaluate the information provided in the package insert or instructions for each chemical, piece of equipment, reagent, or device for, among other things, any changes in the instructions or indication of usage and for added warnings and precautions. While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
Library of Congress Cataloging‐in‐Publication Data
Names: Cook, R. Dennis, author.
Title: An introduction to envelopes : dimension reduction for efficient estimation in multivariate statistics / R. Dennis Cook.
Description: 1st edition. | Hoboken, NJ : John Wiley & Sons, 2018. | Series:
Wiley series in probability and statistics |
Identifiers: LCCN 2018023695 (print) | LCCN 2018036057 (ebook) | ISBN 9781119422952 (Adobe PDF) | ISBN 9781119422969 (ePub) | ISBN 9781119422938 (hardcover)
Subjects: LCSH: Multivariate analysis. | Dimension reduction (Statistics)
Classification: LCC QA278 (ebook) | LCC QA278 .C648 2018 (print) | DDC
519.5/35 – dc23
LC record available at https://lccn.loc.gov/2018023695
Cover Design: Wiley
Cover Image: © teekid/iStockphoto and Courtesy of R. Dennis Cook
For Sandra
Sister and Artist
1949–2017
… the objective of statistical methods is the reduction of data. A quantity of data… is to be replaced by relatively few quantities which shall adequately represent… the relevant information contained in the original data.
Since the number of independent facts supplied in the data is usually far greater than the number of facts sought, much of the information supplied by an actual sample is irrelevant. It is the object of the statistical process employed in the reduction of data to exclude this irrelevant information, and to isolate the whole of the relevant information contained in the data.
Fisher (1922): On the Mathematical Foundations of Theoretical Statistics.
Dimension reduction has always been a leitmotif of statistical thought, as illustrated by the title of a nineteenth century article by Edgeworth (1884) – “On the reduction of observations” – and by the foundations of theoretical statistics established by Fisher (1922). Dimension reduction interpreted loosely is today a huge area. Many ad hoc methods have been developed by the Computer Science and Machine Learning communities (Burges, 2009), and it seems that new methods are being proposed every day. The material in this book reflects a subset of the dimension reduction literature that can be motivated in a manner that is reminiscent of Fisher's original motivation of sufficiency.
Sufficient dimension reduction for regression strives to find a low‐dimensional set of proxy predictors that can be used in place of the full set of predictors without the loss of regression information and without a prespecified parametric model. It has expanded substantially over the past three decades, and today is a widely recognized statistical paradigm that can be particularly helpful in the initial stages of an analysis, prior to model selection. However, the formal paradigm stops typically at the estimation of the proxy predictors, leaving the analyst to carry on with other context‐dependent methods, and has little to offer when a traditional model is available.
Envelopes, the topic of this book, can be seen as a descendent of sufficient dimension reduction that can be applied in model‐based contexts without altering traditional objectives, and they have also the potential to offer advances in model‐free contexts as well. Most importantly, envelope methods can result in massive efficiency gains relative to standard methods, gains that are equivalent to increasing the sample size many times over.
We begin in Chapter 1 by using response envelopes to improve estimative efficiency in the context of the usual multivariate (multiresponse) linear model, following a brief review of the model itself. This chapter was written to emphasize the fundamental rationale underlying envelopes, providing arguments that hopefully appeal to intuition along with selected mathematical details. These foundations are then adapted to other contexts in subsequent chapters. I have tried to achieve a useful balance between foundations and methodology by integrating illustrative examples throughout the book, but particularly in the first few chapters. Most proofs are provided in the appendices, or the reader is referred to the literature. But proofs are presented occasionally in the main text when they seem useful to show how various ideas integrate to produce the desired results. Some of the essential development in Chapter 1 is revisited in subsequent chapters because some readers may prefer to skip around, and a reminder from time to time could be useful to show how arguments carry from one context to another.
Still in the context of response envelopes, Chapter 2 consists of a number of small examples to show the potential advantages of envelopes and to illustrate how they might be used in practice, and in Chapter 3, we discuss how to use envelopes to target selected coefficients. Chapter 4 is devoted to predictor envelopes and their connection with partial least squares regression, and Chapter 5 shows the potential for envelope methodology to improve estimation of a multivariate mean. We turn to objective function optimization in Chapter 6 where we discuss several algorithms, including relatively slow likelihood‐based methods and relatively fast moment‐based methods. The discussion to this point is in terms of the multivariate linear model introduced in Chapter 1. In Chapter 7, we sketch how envelope methodology can be applied to improve an asymptotically normal vector or matrix estimator. We also discuss how envelopes can be used in regressions with a matrix‐valued response and sketch work on sparse and Bayesian response envelopes. Inner and scaled envelopes, which are discussed in Chapter 8, are envelope forms that allow for efficiency gains in ways that are different from that introduced in Chapter 1. We discuss in Chapter 9 the relationships between envelopes and other dimension reduction methods, including canonical correlations, reduced‐rank regression, supervised singular value decomposition, sufficient dimension reduction, principal components, and principal fitted components.
R or MatLab computer programs are available for nearly all of the methods discussed in this book and many have been implemented in both languages. Most of the methods are available in integrated packages, but some come as standalone programs. These programs and packages are not discussed in this book, but descriptions of and links to them can be found at z.umn.edu/envelopes. This format will allow for updates and links to developments following this book.
Earlier versions of this book were used as lecture notes for a one‐semester course at the University of Minnesota. Most students who attended this course had background in mathematical statistics and multivariate analysis. Students in this course and my collaborators contributed to the ideas and flavor of the book. In particular, I would like to thank Austin Brown, Jami Cook, Shanshan Ding, Daniel Eck, Liliana Forzani, Ming Gao, Inga Helland, Zhihua Su, Xin (Henry) Zhang, and Xinyu Zhang for many helpful discussions. Daniel Eck wrote the first draft of the discussion of Aster models in Section 7.1.2, and Liliana Forzani and Zhihua Su collaborated on Appendix C. Zhihua Su and her students wrote nearly all of the R and MatLab programs for implementing envelopes. Much of this book reflects the many stimulating conversations I had with Bing Li and Francesca Chiaromonte during the genesis of envelopes.
Nearly all of my work on envelopes was supported by grants from the National Science Foundation's Division of Mathematical Sciences.
R. Dennis Cook
St. Paul, Minnesota
December 2017
For positive integers and , stands for the class of all real matrices of dimension , and denotes the class of all symmetric matrices. For , indicates the Moore–Penrose inverse of . Vectors and matrices will typically be written in bold face, while scalars are not bold. The identity matrix is denoted as , and denotes the vector of 1's. The th element of a matrix is often denoted as .
For , the notation means that is positive definite, means that is positive semi‐definite, denotes the determinant of , and denotes the product of the nonzero eigenvalues of . The spectral norm of is denoted as .
The vector operator stacks the columns of the argument matrix. On the symmetric matrices, we use the related vector‐half operator , which stacks only the unique part of each column that lies on or below the diagonal. The operators and are related through a contraction matrix and an expansion matrix , which are defined so that and for any . These relations uniquely define and and imply . The commutation matrix that maps to is denoted by : . For further background on these operators, see Appendix A.5, Henderson and Searle (1979) and Magnus and Neudecker (1979).
Let and . The Kronecker product of two matrices can be defined block‐wise as , , .
We often denote the eigenvalues of as with corresponding ordered eigenvector . The arguments to and may be suppressed when they are expected to be clear from context.
A block diagonal matrix with diagonal blocks , , is represented as . The direct sum of two matrices and is defined as block diagonal matrix : .
Subspaces.
For and a subspace , . For , denotes the subspace of spanned by the columns of . We occasionally use as shorthand for when has been defined. Subscripts will also be used to name commonly occurring subspaces. A basis matrix for a subspace is any matrix whose columns form a basis for . A semi‐orthogonal matrix has orthogonal columns, . We will frequently refer to semi‐orthogonal basis matrices.
Let and with . Then equals times the span of the first eigenvectors of . This subspace can also be described as the span of the first eigenvectors of relative to .
A sum of subspaces of is defined as . We use to indicate that the subspace is a proper subset of , while allows .
For a positive definite matrix , the inner product in defined by is referred to as the inner product; when , the
Envelopes, which were introduced by Cook et al. (2007) and developed for the multivariate linear model by Cook et al. (2010), encompass a class of methods for increasing efficiency in multivariate analyses without altering traditional objectives. They serve to reshape classical methods by exploiting response–predictor relationships that affect the accuracy of the results but are not recognized by classical methods. Multivariate data are often modeled by combining a selected structural component to be estimated with an error component to account for the remaining unexplained variation. Capturing the desired signal and only that signal in the structural component can be an elusive task with the consequence that, in an effort to avoid missing important information, there may be a tendency to overparameterize, leading to overfitting and relatively soft inferences and interpretations. Essentially a type of targeted dimension reduction that can result in substantial gains in efficiency, envelopes operate by enveloping the signal and thereby account for extraneous variation that might otherwise be present in the structural component.
In this chapter, we consider multivariate (multiresponse) linear regression allowing for the presence of “immaterial variation” (described herein) in the response vector. The possibility of such variation being present in the predictors is considered in Chapter 4, where we develop a connection with partial least squares regression. Section 1.1 contains a very brief review of the multivariate linear model, with an emphasis on aspects that will play a role in later developments. Additional background is available from Muirhead (2005). The envelope model for response reduction is introduced in Section 1.2. Introductory illustrations are given in Section 1.3 to provide intuition, to set the tone for later developments, and to provide running examples. In later sections, we discuss additional properties of the envelope model, maximum likelihood estimation, and the asymptotic variance of the envelope estimator of the coefficient matrix. Most of the technical materials used in this chapter are taken from Cook et al. (2010). Some algebraic details are presented without justification. The missing development is given extensively in Appendix A, which covers the linear algebra of envelopes.
Consider the multivariate regression of a response vector on a vector of nonstochastic predictors . The standard linear model for describing a sample can be represented in vector form as
where the predictors are centered in the sample , the error vectors are independently and identically distributed normal vectors with mean 0 and covariance matrix , is an unknown vector of intercepts, and is an unknown matrix of regression coefficients. Centering the predictors facilitates discussion and presentation of some results, but is technically unnecessary. If is stochastic, so and have a joint distribution, we still condition on the observed values of since the predictors are ancillary under model 1.1. The normality requirement for is not essential, as discussed in Section 1.9 and in later chapters.
Let denote the centered matrix with rows , let denote the uncentered matrix with rows , and let denote the matrix with rows , . Also, let and
Then the maximum likelihood estimator of is , and the maximum likelihood estimator of , which is also the ordinary least squares estimator, is
where the second equality follows because the predictors are centered. To see this result, let and denote the th vectors of fitted values and residuals, , and let . Then after substituting for , the remaining log‐likelihood to be maximized can be expressed as
where and the last step follows because . Consequently, is maximized over by setting so , leaving the partially maximized log‐likelihood
It follows that the maximum likelihood estimator of is and that the fully maximized log‐likelihood is
We notice from 1.2 that can be constructed by doing separate univariate linear regressions, one for each element of on . The coefficients from the th regression then form the th row of , . The stochastic relationships among the elements of are not used in forming these estimators. However, as will be seen later, relationships among the elements of play a central role in envelope estimation. Standard inference on , the th element of , under model 1.1 is the same as inference obtained under the univariate linear regression of , the th element of , on . Model 1.1 becomes operational as a multivariate construction when inferring simultaneously about elements in different rows of or when predicting elements of jointly.
The sample covariance matrices of , , and can be expressed as
where is nonstochastic, denotes the projection onto the column space of , , is the sample covariance matrix of the fitted vectors , and is the sample covariance matrix of the residuals .
We will occasionally encounter the standardized version of ,
which corresponds to the estimated coefficient matrix from the ordinary least squares fit of the standardized responses on the standardized predictors .
The joint distribution of the elements of can be found by using the operator to stack the columns of : , where denotes the Kronecker product. Since is normally distributed with mean and variance , it follows that is normally distributed with mean and variance
The covariance matrix can also be represented in terms of by using the commutation matrix to convert to : and
Background on the commutation matrix, and related operators is available in Appendix A. The variance is typically estimated by substituting the residual covariance matrix for ,
Let denote the indicator vector with a 1 in the th position and 0s elsewhere. Then the covariance matrix for the th row of is
We see from this that the covariance matrix for the th row of is the same as that from the marginal linear regression of on . We refer to the estimator divided by its standard error as a ‐score:
This statistic will be used from time to time for assessing the magnitude of , sometimes converting to a ‐value using the standard normal distribution.
We will occasionally encounter a conditional variate of the form , where is a normal vector with mean and variance , and is a nonstochastic matrix with . The mean and variance of this conditional form are as follows:
The usual log‐likelihood ratio statistic for testing that is
which is asymptotically distributed under the null hypothesis as a chi‐square random variable with degrees of freedom. We will occasionally use this statistic in illustrations to assess the presence of any detectable dependence of on . This statistic is sometimes reported with an adjustment that is useful when is not large relative to and (Muirhead 2005, Section 10.5.2).
The Fisher information for in model 1.1 is
where is the expansion matrix that satisfies for , and . It follows from standard likelihood theory that is asymptotically normal with mean 0 and variance given by the upper left block of ,
Asymptotic normality holds also without normal errors but with some technical conditions: if the errors have finite fourth moments and , then converges in distribution to a normal vector with mean 0 (e.g. Su and Cook 2012, Theorem 2).
A subset of the predictors may occasionally be of special interest in multivariate regression. Partition into two sets of predictors and , , and conformably partition the columns of into and . Then model 1.1 can be rewritten as
where holds the coefficients of interest. We next reparameterize this model to force the new predictors to be uncorrelated in the sample and to focus attention on .
Recalling that , let denote a typical residual from the ordinary least squares fit of on , and let . Then the partitioned model can be reexpressed as
In this version of the partitioned model, the parameter vector is the same as that in 1.16, while unless . The predictors – and – in 1.17 are uncorrelated in the sample , and consequently the maximum likelihood estimator of is obtained by regressing on . The maximum likelihood estimator of can also be obtained by regressing , the residuals from the regression of on , on . A plot of versus is called an added variable plot (Cook and Weisberg 1982). These plots are often used in univariate linear regression () as general graphical diagnostics for visualizing how hard the data are working to fit individual coefficients.
Added variable plots and the partitioned forms of the multivariate linear model 1.16 and 1.17 will be used in this book from time to time, particularly in Chapter 3.
Because the elements of and are unconstrained, model 1.1 allows each coordinate of to have a different linear regression on . It could be necessary in some applications to restrict the elements of