E-Book
100,99 €

Fundamental Statistical Inference E-Book

Marc S. Paolella

0,0

100,99 €

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.

Herausgeber: John Wiley & Sons
Kategorie: Wissenschaft und neue Technologien
Serie: Wiley Series in Probability and Statistics
Sprache: Englisch

Beschreibung

A hands-on approach to statistical inference that addresses the latest developments in this ever-growing field This clear and accessible book for beginning graduate students offers a practical and detailed approach to the field of statistical inference, providing complete derivations of results, discussions, and MATLAB programs for computation. It emphasizes details of the relevance of the material, intuition, and discussions with a view towards very modern statistical inference. In addition to classic subjects associated with mathematical statistics, topics include an intuitive presentation of the (single and double) bootstrap for confidence interval calculations, shrinkage estimation, tail (maximal moment) estimation, and a variety of methods of point estimation besides maximum likelihood, including use of characteristic functions, and indirect inference. Practical examples of all methods are given. Estimation issues associated with the discrete mixtures of normal distribution, and their solutions, are developed in detail. Much emphasis throughout is on non-Gaussian distributions, including details on working with the stable Paretian distribution and fast calculation of the noncentral Student's t. An entire chapter is dedicated to optimization, including development of Hessian-based methods, as well as heuristic/genetic algorithms that do not require continuity, with MATLAB codes provided. The book includes both theory and nontechnical discussions, along with a substantial reference to the literature, with an emphasis on alternative, more modern approaches. The recent literature on the misuse of hypothesis testing and p-values for model selection is discussed, and emphasis is given to alternative model selection methods, though hypothesis testing of distributional assumptions is covered in detail, notably for the normal distribution. Presented in three parts--Essential Concepts in Statistics; Further Fundamental Concepts in Statistics; and Additional Topics--Fundamental Statistical Inference: A Computational Approach offers comprehensive chapters on: Introducing Point and Interval Estimation; Goodness of Fit and Hypothesis Testing; Likelihood; Numerical Optimization; Methods of Point Estimation; Q-Q Plots and Distribution Testing; Unbiased Point Estimation and Bias Reduction; Analytic Interval Estimation; Inference in a Heavy-Tailed Context; The Method of Indirect Inference; and, as an appendix, A Review of Fundamental Concepts in Probability Theory, the latter to keep the book self-contained, and giving material on some advanced subjects such as saddlepoint approximations, expected shortfall in finance, calculation with the stable Paretian distribution, and convergence theorems and proofs.

Details

Sie lesen das E-Book in den Legimi-Apps auf:

Android

iOS

von Legimi
zertifizierten E-Readern

Seitenzahl: 974

Veröffentlichungsjahr: 2018

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Leseprobe

Cover

Preface

Part I: Essential Concepts in Statistics

Chapter 1: Introducing Point and Interval Estimation

1.1 Point Estimation

1.2 Interval Estimation via Simulation

1.3 Interval Estimation via the Bootstrap

1.4 Bootstrap Confidence Intervals in the Geometric Model

1.5 Problems

Chapter 2: Goodness of Fit and Hypothesis Testing

2.1 Empirical Cumulative Distribution Function

2.2 Comparing Parametric and Nonparametric Methods

2.3 Kolmogorov–Smirnov Distance and Hypothesis Testing

2.4 Testing Normality with KD and AD

2.5 Testing Normality with

and

2.6 Testing the Stable Paretian Distributional Assumption: First Attempt

2.7 Two-Sample Kolmogorov Test

2.8 More on (Moron?) Hypothesis Testing

2.9 Problems

Chapter 3: Likelihood

3.1 Introduction

3.2 Cramér–Rao Lower Bound

3.3 Model Selection

3.4 Problems

Chapter 4: Numerical Optimization

4.1 Root Finding

4.2 Approximating the Distribution of the Maximum Likelihood Estimator

4.3 General Numerical Likelihood Maximization

4.4 Evolutionary Algorithms

4.5 Problems

Chapter 5: Methods of Point Estimation

5.1 Univariate Mixed Normal Distribution

5.2 Alternative Point Estimation Methodologies

5.3 Comparison of Methods

5.4 A Primer on Shrinkage Estimation

5.5 Problems

Part II: Further Fundamental Concepts in Statistics

Chapter 6: Q-Q Plots and Distribution Testing

6.1 P-P Plots and Q-Q Plots

6.2 Null Bands

6.3 Q-Q Test

6.4 Further P-P and Q-Q Type Plots

6.5 Further Tests for Composite Normality

6.6 Combining Tests and Power Envelopes

6.7 Details of a Failed Attempt

6.8 Problems

Chapter 7: Unbiased Point Estimation and Bias Reduction

7.1 Sufficiency

7.2 Completeness and the Uniformly Minimum Variance Unbiased Estimator

7.3 An Example with i.i.d. Geometric Data

7.4 Methods of Bias Reduction

7.5 Problems

Chapter 8: Analytic Interval Estimation

8.1 Definitions

8.2 Pivotal Method

8.3 Intervals Associated with Normal Samples

8.4 Cumulative Distribution Function Inversion

8.5 Application of the Nonparametric Bootstrap

Problems

Part III: Additional Topics

Chapter 9: Inference in a Heavy-Tailed Context

9.1 Estimating the Maximally Existing Moment

9.2 A Primer on Tail Estimation

9.3 Noncentral Student's

Estimation

9.4 Asymmetric Stable Paretian Estimation

9.5 Testing the Stable Paretian Distribution

Chapter 10: The Method of Indirect Inference

10.1 Introduction

10.2 Application to the Laplace Distribution

10.3 Application to Randomized Response

10.4 Application to the Stable Paretian Distribution

Problems

Appendix A: Review of Fundamental Concepts in Probability Theory

A.1 Combinatorics and Special Functions

A.2 Basic Probability and Conditioning

A.3 Univariate Random Variables

A.4 Multivariate Random Variables

A.5 Continuous Univariate Random Variables

A.6 Conditional Random Variables

A.7 Generating Functions and Inversion Formulas

A.8 Value at Risk and Expected Shortfall

A.9 Jacobian Transformations

A.10 Sums and Other Functions

A.11 Saddlepoint Approximations

A.12 Order Statistics

A.13 The Multivariate Normal Distribution

A.14 Noncentral Distributions

A.15 Inequalities and Convergence

A.16 The Stable Paretian Distribution

A.17 Problems

A.18 Solutions

References

Index

End User License Agreement

List of Tables

Chapter 1

Table 1.1 Comparison of three point estimators for the geometric model

Chapter 2

Table 2.1 Power of the KD and AD tests for the mixture of geometric distributions example

Table 2.2 Cutoff values for the KD and AD composite tests of normality, as a function of sample size

and significance level

, to four significant digits, based on simulation with 10 million replications

Chapter 3

Table 3.1 Mean squared error values for five models. The true model is Student's

, column T, with five degrees of freedom

Table 3.2 Mean squared error values for five models. True model is NormLap, column L, with

Table 3.3 Mean squared error values for five models. True model is symmetric stable, column S, with

Chapter 5

Table 5.1 Empirical coverage of one-at-a-time 95% c.i.s of mixed normal models (5.28) (left) and (5.4) (right) based on

observations

Table 5.2 Similar to Table 5.1 but for the experiment 2 and contaminated models

Table 5.3 The time required to estimate 100 of the contaminated model data sets (5.7), each with

= 100, on a standard 3.2 GHz PC, and given in seconds unless otherwise specified. All methods using the generic optimizer are based on a convergence tolerance of

, while the EM algorithm used a convergence tolerance of

. The calculation of the direct m.l.e. is just denoted by m.l.e., whereas EM indicates the use of the EM algorithm, and q-B.e. denotes the quasi-Bayesian estimator with shrinkage prior and strength

= 4

Table 5.4 Actual coverage of nominal one-at-a-time c.i.s based on the bootstrap, for four models and two estimation methods

Table 5.5 Similar to Table 5.4 but using the qB(1) and qB(4) estimation methods

Chapter 6

Table 6.1 Coefficients in regression (6.3)

Table 6.2 Comparison of power for various normal tests of size 0.05, using the two-component mixed normal distribution as the alternative, obtained via simulation with 1 million replications for each model, and based on two sample sizes,

and

. Model #0 is the normal distribution, used to serve as a check on the size. The entry with the highest power for each alternative model and sample size

appears in

bold

. Entries with power lower than the nominal/actual value of 0.05, indicating a biased test, are given in

italic

Table 6.3 Correlation between tests for normality under the null, using sample size

, based on 1 million replications. For the

test, nine bins were used

Table 6.4 The power of various tests for Laplace, against the Gaussian alternative, for

and

Chapter 7

Table 7.1

as a function of

, for Problem 16

Chapter 8

Table 8.1 Comparison of lengths of 95% c.i.s for

in the normal model with sample size

Table 8.2 Quantiles and lengths for the equal-tail (et) and minimal-length (min) 95% c.i.s for

Table 8.3 Accuracy of the 95% c.i. of

that conditions on the observed sum. EC is the empirical coverage proportion, with lo and hi denoting the endpoints of the asymptotic 95% c.i. for EC

Chapter 9

Table 9.1 Actual sizes of the nominal 5% and 1%

test (9.11) for data of length

with tail index

. The entries in italics (for

and 1000) make use of the adjustment procedure via multiplicative factor

. The remaining entries do not use the adjustment procedure

Table 9.2 Actual sizes of the nominal 5% and 1%

test (9.15) for data of length

with tail index

Table 9.3 Actual sizes of the nominal 5% and 1%

test (9.16) for data of length

with tail index

. The rows labeled

and

indicate the use of the true value of

instead of using

and linear interpolation into the constructed table of

-values

Table 9.4 Actual sizes of tests, for i.i.d. symmetric stable data with tail index

, for sample size

, using the nominal size of 5%

Table 9.5 Power against the Student's

alternative, for degrees-of-freedom values

and sample size

, using the nominal size of 5%

Table 9.6 Power against the mixed normal alternative, with p.d.f.

, for second component scale values

, using the nominal size of 5%

Table 9.7 Power against the NIG alternative, with p.d.f. (9.17), using

, and shape values

, for the nominal size of 5%

Table 9.8 Power against the GA

alternative using

and

(and

, though the location and scale terms are irrelevant for power considerations), and shape values

, for nominal size of 5%

Table 9.9 Actual sizes of the

and ALHADI nominal 5% tests, as designed for symmetric stable data but applied to asymmetric stable data, using 10,000 replications, and based on

and

, using sample size

, and ignoring the asymmetry

Table 9.10 Similar to Table 9.9, showing actual size for a nominal size of 5%, again based on sample size

and using 10,000 replications, but accounting for asymmetry by having applied transform (9.19). Also shown are the actual sizes of the combined test

and l.r.t.

Table 9.11 For

and nominal size 5%, power values against asymmetric alternatives of the

, ALHADI, combined

, and l.r.t. (9.18) tests, using transform (9.19); and, in the last row, l.r.t. (9.21). The left panels show the power for the noncentral Student's

based on

degrees of freedom and noncentrality (asymmetry) parameters

. The center panels use the asymmetric NIG (9.17) with NIG shape parameters

and

. The rightmost column is for the IHS distribution (9.20) with

and

List of Illustrations

Chapter 1

Figure A.1 (a) True

expected shortfall of a standard skew normal random variable as a function of asymmetry parameter

(solid) and its s.p.a. based on (A.114) (dashed). (b) The relative percentage error of the s.p.a. based on (A.114) (denoted SPA1) and that of the less accurate of two second-order s.p.a.s (denoted SPA2) developed in Broda et al. (2017).

Figure A.2 The vertical axis is

, and the horizontal axis is

. This graphically verifies that

, where

is the region above the line

(left plot),

is the region indicated by horizontal lines, and

is the region indicated by vertical lines.

Figure A.3 The function

, where

is given in (A.291).

Figure A.4 (a) Asymmetric stable p.d.f. for

and two values of

. (b) Discrepancy between the two computation methods using the

case.

Figure A.5 Expected shortfall for

as a function of

, using

(solid) and

(dashed).

Figure A.6 (a) The exact expected shortfall (solid lines) and its saddlepoint approximation (dashed lines) as a function of

for an

random variable (truncated at

for visibility reasons), and three values of

. (b) The relative percentage error of the s.p.a., shown up to

. The relative percentage error is symmetric about

Figure A.7 Exact and s.p.a. skew normal p.d.f. with

Figure A.8 (a) The true (solid) and second-order s.p.a. (dashed) of the convolution of two independent skew normal r.v.s with

, and

. (b) The relative percentage error of the first (solid) and second-order (dashed) s.p.a.

Figure A.9 Venn diagram for events such that

, and

Figure A.10 Mass functions (A.344) (solid lines) and kernel density estimates of the simulated density (dashed lines) based on 10,000 replications, for (from left to right)

, and

Figure A.11 Theoretical density and kernel density from simulation, as well as fitted beta, for

and two values of

Chapter 1

Figure 1.1 Distribution of point estimators

(a),

(b), and

and

, based on simulation with 10,000 replications.

Figure 1.2 Histogram of point estimator

for

and four values of

, based on simulation with 1 million replications.

Figure 1.3 The m.s.e. of estimators

(lines) and

(lines with circles) for parameter

in the geometric model, as a function of

, for three sample sizes, obtained by simulation with 100,000 replications.

Figure 1.4 Simulations of

for

, for

and

(a) and

(b), based on 10,000 replications.

Figure 1.5 Mapping

between nominal and actual coverage probabilities for c.i.s of the success parameter in the i.i.d. Bernoulli model, based on the (single) parametric bootstrap, each computed via simulation with 100,000 replications.

Figure 1.6 (a) Actual coverage, based on simulation with 10,000 replications, of nominal 90% c.i.s using the (single) nonparametric bootstrap (with

). Graph is truncated at 0.6, with the actual coverage for

and

being about

. (b) Same but using the modified c.i. in (1.5) and (1.6).

Figure 1.7 Same as Figure 1.6(b), but using different numbers of bootstrap replications.

Figure 1.8 (a) Actual coverage of nominal 90% c.i.s using the double bootstrap (truncated at 0.3), based on 1000 replications. (b) Same but using the modified c.i. in (1.5) and (1.6) applied to each simulated data set and to each bootstrap sample in the outer bootstrap loop.

Figure 1.9 Similar to Figure 1.5 (mapping between nominal and actual coverage probabilities for c.i.s of the success parameter in the i.i.d. Bernoulli model) except that, instead of using the single bootstrap for the c.i.s, this uses the analytic method. In figure b) actual coverage for a given

is identical to that for

Figure 1.10 Actual coverage of nominal 90% c.i.s using the double bootstrap with inner loop replaced by the analytic c.i., and having used the modification (1.5) and (1.6) in the outer bootstrap loop. (a) uses

; (b) uses

Figure 1.11 Mapping between nominal and actual coverage probabilities for c.i.s of the success parameter

in the i.i.d. geometric model, using the parametric bootstrap. Based on 100,000 replications and

Figure 1.12 Same as Figure 1.11, also with

and

, but with the nonparametric bootstrap (NPB).

Figure 1.13 (a) Actual coverage of the three types of c.i.s (lines), along with the true nominal coverage,

, from (1.7), as dark circles. (b) The average length of the c.i.s.

Chapter 2

Figure 2.1 The true distribution, obtained via simulation with 10,000 replications, of the Kolmogorov–Smirnov goodness-of-fit test statistic, and its asymptotic distribution (2.7) for the standard (location-zero, scale-one) normal (a) and Cauchy (b) distributions.

Figure 2.2

in (2.15) versus

based on simulation.

Figure 2.3 (a) The e.c.d.f. (solid) based on 50 observations and true c.d.f. (dotted) of an

. (b) Same, but adds horizontal 95% error bounds obtained by simulation of order statistics using the true

model. The horizonal line at the 34th order statistic just serves as a reminder that the bounds are to be understood horizontally.

Figure 2.4 The same e.c.d.f. (solid) and true

c.d.f. (dotted) as in Figure 2.3 but with 95% nonparametric bootstrap c.i.s (a) and 95% asymptotic c.i.s (b).

Figure 2.5 (a) The e.c.d.f. (solid) based on 50 observations and true c.d.f. (dashed) of a

distribution. (b) Same, but adds vertical 95% error bounds (these are

not

c.i.s) obtained by simulation of order statistics using the true

model.

Figure 2.6 The same e.c.d.f. (solid) and true

c.d.f. (dotted) as in Figure 2.5 but with 95% nonparametric bootstrap c.i.s (a) and 95% asymptotic c.i.s (b).

Figure 2.7 The m.s.e. comparison of the three estimators nonparametric (solid), parametric using the m.l.e.

(dashed), and parametric using the efficient estimator

(dash-dotted), as a function of

, using sample size

, for estimating the probability of getting pregnant within (up to, and including) 4 (top) and 8 (bottom) months, where in the graphics

moi

stands for “month of interest.” Based on simulation with 100,000 replications.

Figure 2.8 Comparison of behavior of correctly specified (top) and misspecified (bottom) fitted c.d.f.s.

Figure 2.9

Top

: Actual coverage of 95% parametric (solid) and nonparametric (dashed) bootstrap c.i.s for

as a function of sample size

under study A (left) and study B (right).

Bottom

: Same, but for 90% c.i.s.

Figure 2.10 Distribution of KD statistic for the i.i.d.

model with

observations, with marked cutoff values

and

Figure 2.11 The actual acceptance probability

(

-axis) versus the nominal probabilities

(solid line), for the KD statistic and the geometric pregnancy example, for

and

(a) and

(b). The dashed

line indicates the case when nominal and actual are equal.

Figure 2.12 (Top) Power of the KD test for normality, using significance level

, for three different sample sizes, and the Student's

alternative (left) and skew normal alternative (right), based on 1 million replications. (Middle) Same, but for the AD test. (Bottom) Same, but power of the

(lines without circles) and

(lines with circles) test for normality. The

and

power curves for the Student's

alternative are graphically indistinguishable.

Figure 2.13 Actual size of the four tests, for nominal size 0.05, based on 10,000 replications.

Figure 2.14 (a) Boxplots of

resulting when estimating all four parameters of the stable model, but with the data generated as Student's

with various degrees of freedom. (b) Power of the proposed set of tests against a Student's

alternative, for various degrees of freedom, and based on 10,000 replications.

Figure 2.15 The true distribution, obtained via simulation with 10,000 replications, of the Kolmogorov–Smirnov two-sample goodness-of-fit test statistic, and its asymptotic distribution (2.7) for the normal (a) and Cauchy (b) distributions.

Chapter 3

Figure 3.1 Standardized log-likelihoods (solid) and quadratic approximation (3.4) (dashed) for the Poisson with

(a) and

(b), with solid and dashed vertical lines showing the m.l.e. and true parameter, respectively.

Figure 3.2 Percentage error of (3.26) when using the Laplace approximation to the

function for

(a) and

(b) for sample sizes

, and

(lines from top to bottom). The

-axis indicates the value of

in (3.26).

Figure 3.3 (a) Density of

for

and

, and

. (b) Bias of

as given in (3.25) and (3.28), for

(solid),

(dashed), and

(dash-dotted) as a function of

. There is no graphical difference when using the Laplace approximation for the

function instead of its exact values.

Figure 3.4 (a) The traditional Mahalanobis distances (3.33) based on the m.l.e.s

and

for the 1945 observations of the returns on the components of the DJIA 30 index. Fifteen percent of the observations lie above the cutoff line. (b) Similar, but having used the robust Mahalanobis distance (3.34) based on the mean vector and covariance matrix from the m.c.d. method, resulting in 33% of observations above the cutoff line.

Figure 3.5 Kernel density estimate using 10,000 replications of the coefficient of variation based on

and

(solid) and the asymptotic normal distribution (dashed), for

(a) and

(b).

Chapter 4

Figure 4.1 Kernel density estimates of the m.l.e. of scale parameter

based on

i.i.d. Cauchy observations,

, 50, and 100. The larger

is, the more mass is centered around the true value of

Figure 4.2 Estimation results for the location parameter of a Cauchy model and illustration of a likelihood with multiple roots.

Figure 4.3 Comparison of the m.l.e. of

and the median of Cauchy samples with

(a) and

(b).

Figure 4.4 The m.s.e. of

versus

as an estimator of the location parameter

of Student's

data with known scale 1 and degrees of freedom 1 (a), 3 (b), 10 (c), and 50 (d), based on a sample size of

observations. The vertical axis was truncated to improve appearance. The dashed line in each plot is the m.s.e. of

Figure 4.5 The top left panel plots

versus

for

, each obtained via simulation using 25,000 replications. The top right is the same, but using a log scale. The bottom panels show the least squares residuals for the linear (left) and quadratic (right) fits for

Figure 4.6 Same as the top right panel in Figure 4.5 but for three additional sample sizes.

Figure 4.7 Daily returns for the NASDAQ index.

Figure 4.8 (a) Kernel density (solid) and fitted Student's

density (dashed) of the NASDAQ returns. (b) Simulation results of the m.l.e. for the Student's

model, based on

observations and true parameter values taken to be the m.l.e. of the

model for the NASDAQ returns. The boxplots show their differences.

Figure 4.9 Convergence of the method of iterating on the score functions (a), method of steepest descent (b), and the BFGS algorithm (c), for the log-likelihood of a

sample of size 100. The number of iterations required to arrive at the m.l.e. of

with the same accuracy was 56, 16, and 11, respectively.

Figure 4.10 (a) Kernel density (dashed) and fitted GA

density of the NASDAQ returns. (b) Same, but showing only the left tail and including the Student's

fit (dash-dotted).

Figure 4.11 Evolution of the DE population over time for selected iteration states, showing (from left to right, top to bottom) iterations 1, 10, 20, 30, 40, and 46.

Figure 4.12 Evolution of the CMAES population over time for selected iteration states, showing (from left to right, top to bottom) iterations 1, 5, 10, 15, 20, and 28.

Chapter 5

Figure 5.1 (a) Mixed normal density with parameters (5.4) shown as the solid line. The two components (multiplied by their respective mixture weights) are shown as dashed lines. (b) Simulated data set from model (5.4) using

and seed value 50, illustrating the possibility of an outlier.

Figure 5.2 (a) Realization of model (5.4) with

. (b) Fitted models (5.10), both being from local maxima of the likelihood.

Figure 5.3 Using showcase model (5.4) with

and the data set shown in Figure 5.1b, the plots show the true density (thick solid line; the same as in Figure 5.1a) and 100 fitted densities (thin lines), each having used a different (randomly chosen) starting value. The box constraint numbers

are given in (5.11), and increasingly place more restrictions on the allowable parameter space.

Figure 5.4 (a) Of the 100 fitted densities shown in Figure 5.3, this shows the one corresponding to the highest likelihood, for each of the four constraints. (b) Same, but from the densities shown in Figure 5.5(a)–(d), which are based on the

-rounded likelihood for

Figure 5.5 Same as Figure 5.3 but using the

-rounded likelihood from (3.1) with

(a–d) and

(e–h).

Figure 5.6 (a) Same as Figure 5.3a but having used 1000 instead of 100 fitted densities. (b) Same, but having used the EM algorithm (which implicitly imposes the same constraints on the

and

as does our constraint 0) with 1000 fitted densities.

Figure 5.7 Comparison of log total m.s.e. for

(leftmost boxplot in all six panels) and

from (5.19), using shrinkage form (5.20), for

(left) and

(right), for a grid of values of

, dictating the strength of the shrinkage. The top panels are for the showcase constellation (5.4) with

observations. The middle and bottom panels correspond to (5.21) and (5.22), respectively. The simulation is based on 1000 replications. The horizontal dashed lines show the median m.l.e. value of

from (5.6). The other dashed line traces the mean of

Figure 5.8 Comparison of total m.s.e. for

(leftmost boxplot in both panels) and

using, for the latter, prior (5.27) with varying strength

. The simulation is based on 1000 replications. The horizontal dashed lines show the median m.l.e. value of

from (5.6). The other dashed line traces the mean of

Figure 5.9 Bias of the m.l.e. (left half of both panels) and q.B.e. (right half of both panels) based on

, for two sample sizes

(a) and

(b), all based on 1000 replications.

Figure 5.10 (a) For the contaminated normal model (5.7) with

, measure

from (5.6) for the m.l.e. computed via the direct method (denoted MLE), the m.l.e. computed via the EM algorithm (denoted EM), the m.m.e. restricted to have equal means, and the unrestricted m.m.e. for the 824 out of 1000 data sets for which the unrestricted (and restricted) m.m.e. existed. The horizontal dashed line shows the median m.l.e. value of

. (b) Same, but using sample size

Figure 5.11 (a) For showcase model (5.4), measure

from (5.6) for the m.l.e. and the m.m.e., using

, and based on 1000 replications. (b) Same, but using the four goodness-of-fit measures in Section 5.2.2.

Figure 5.12 Same as Figure 5.11, but using the contaminated normal model (5.7) with

Figure 5.13 (a) Same as Figure 5.11 but using the m.l.e. and q.l.s. estimators for several values of

, applied to the showcase model (5.4) for

. (b) Same, but for the

estimator for several values of

Figure 5.14

Left

: Similar to Figure 5.13a, using the

estimator applied to our showcase model (5.4),

replications and sample size

, for fixed number of bins

, and penalized according to (5.36) for

and two sets of

-values (top and bottom).

Right

: Same, but for the contaminated normal model (5.7), with

Figure 5.15 (a) Same as Figures 5.11 and 5.13, but using the m.l.e. and the empirical m.g.f. estimator (5.37) for several values of

. Q&R denotes the use of

, with the

being those suggested by Quandt and Ramsey (1978). (b) Same, but using the model for experiment 2 in (5.21).

Figure 5.16 (a) Similar to Figure 5.15(b), but using the empirical m.g.f. estimator, with

, with shrinkage, for

and a set of shrinkage values

, as in (5.36), and based on 1000 replications. The

-axis gives the value of

times

, that is, the values of

are very close to zero. (b) Same, but for the contaminated normal model (5.7), with

Figure 5.17 Horse race between the various methods of estimation for the models considered throughout the chapter. All are based on

replications and sample size

– except experiment 4, which uses

Figure 5.18 Mean squared error, as a function of

, based on simulation with 1000 replications, of

(a) and

(b) for the m.m.e. using two moment equations, with

. It is based on data

, where

, with

, and

and

are to be estimated. True values are

Figure 5.19 Boxplot of 1000 values of

using the Tailx estimator (and the last boxplot being the McCulloch quantile estimator), using the true values of

and sample size

as indicated in the titles of the plots.

Figure 5.20 Measure

from (5.6) for the m.l.e. and empirical m.g.f. estimator (with

) using the contaminated normal model (5.7) but for different values of

Chapter 6

Figure 6.1 Q-Q plots for the same Cauchy data set, just differing by the range on the

- and

-axes.

Figure 6.2 Q-Q plot for a random

sample of size

with 10% and 5% pointwise null bands obtained via simulation (top panels), using the estimated parameters (left) and the true parameters (right) of the data. The bottom panels are similar, but based on the asymptotic distribution in (A.189).

Figure 6.3 Q-Q plots with pointwise null bands, using a size of 0.05, for the same Cauchy data as shown in Figure 6.1.

Figure 6.4 The mapping between pointwise and simultaneous significance levels, for normal data (a) and Weibull data (b) using sample size

Figure 6.5 Power of Q-Q test for normality, for three different sample sizes, and Student's

alternative (a) and skew normal alternative (b), based on simulation with 1000 replications.

Figure 6.6 (a) Cauchy S-P plot with null bands, obtained via simulation, using a pointwise significance level of 0.01. (b) Same, but using the horizontal format.

Figure 6.7 (Top) Stabilized P-P plot using the same random

sample of size

as in Figure 6.2 with 10% and 5% pointwise null bands obtained via simulation, using the estimated parameters (left) and the true parameters (right) of the data. (Bottom) Same as top, but with constant-width null bands.

Figure 6.8 Same as Figure 6.7, but plotted in horizontal format.

Figure 6.9 (a) The solid, dashed, and dash-dotted lines are the widths for the pointwise null bands of the normal MSP plot, as a function of the pointwise significance level

, computed using simulation with 50,000 replications. The overlaid dotted curves are the same, but having used the instantaneously computed approximation from (6.4) and (6.3). There is no optical difference between the simulation and the approximation. (b) For the normal MSP plot, the mapping between pointwise and simultaneous significance levels using sample size

Figure 6.10 Power of the MSP test for normality, for three different sample sizes, and Student's

alternative (a) and skew normal alternative (b), based on 1 million replications.

Figure 6.11 Kernel density and fitted skew normal distribution of sample size

times the MSP test statistic (6.6), computed under the null, and based on 1 million replications.

Figure 6.12 One million

-values from the MSP test with

, under the null (a) and for a Student's

with

degrees of freedom alternative (b).

Figure 6.13 Normal Fowlkes-MP (left) and normal MSP (right) plots, with simultaneous null bands, for normal data (top) and mixed normal data (bottom).

Figure 6.14 Power of Fowlkes-MP test for normality, for three different sample sizes, and Student's

alternative (a) and skew normal alternative (b), based on 1 million replications.

Figure 6.15 The average and smallest

-values of the MSP univariate test of normality from Section 6.4.3, for the

stocks comprising the DJIA, in each of the two separated mixed normal components, and based on moving windows of sample size

Figure 6.16 Kernel density estimate (solid) of the log of the JB test statistic, under the null of normality and using a sample size of

, based on 10 million replications (and having used Matlab's

ksdensity

function with 300 equally spaced points). (a) fitted GA

density (dashed). (b) fitted noncentral

(dashed) and asymmetric stable (dash-dotted).

Figure 6.17 Simulated

-values of the JB test statistic, based on 1 million replications, using the GA

approximation (a) and the two-component GA

mixture (b).

Figure 6.18 Power of JB test for normality, for three different sample sizes, and Student's

alternative (a) and skew normal alternative (b), based on 100,000 replications.

Figure 6.19 Power of the Ghosh (1996) test for normality, for three different sample sizes, and Student's

alternative (a) and skew normal alternative (b), based on 100,000 replications.

Figure 6.20 Power of KL1 test for normality, for three different sample sizes, and Student's

alternative (a) and skew normal alternative (b), based on 100,000 replications.

Figure 6.21 Power of the Torabi et al. (2016) (TMG) test for normality, for three different sample sizes, and Student's

alternative (a) and skew normal alternative (b), based on 100,000 replications.

Figure 6.22 Histograms of 1000

values from the p.i.t., using 30 bins and

the normal c.d.f. with mean and variance parameters estimated from the data.

Figure 6.23 Actual size of

test as a function of the number of bins

, used as a composite test of normality (two unknown parameters), based on 100,000 replications, using the built-in Matlab function

chi2gof

(solid), and the custom implementation in Listing 6.9, using the asymptotically valid cutoff values from the

distribution (dashed), and cutoff values obtained via simulation (dash-dotted).

Figure 6.24 The power of the

test for normality, against Student's

alternatives with various degrees of freedom and for four sample sizes

, with nominal size

, using the method from Listing 6.9 with simulated cutoff values, and based on 1 million replications.

Figure 6.25 Same as Figure 6.24 but using the skew normal as the alternative, with various asymmetry parameters

and sample sizes

Figure 6.26 (a) Power of size 0.05 Pearson

test for normality, based on 100,000 replications, using three bins (the optimal number, as indicated in Figure 6.24) and simulated cutoff values, for three different sample sizes, and Student's

alternative. (b) Same but using 11 bins (the compromise value from Figure 6.25) and the skew normal alternative.

Figure 6.27 Power of the MSP, JB, and joint tests using

and 100,000 replications.

Figure 6.28 (a) The power of the JB test (6.7) against the alternative of a Student's

(lines with circles; same power curves as given in Figure 6.18(a)), along with the power of the likelihood ratio test (using the Student's

as the specific alternative), based on 10,000 replications. (b) The power of the MSP test for normality against the alternative of skew normal (lines with circles; same power curves as in Figure 6.10(b)), along with the power of the likelihood ratio test (using the skew normal as the specific alternative), based on 10,000 replications.

Figure 6.29 (a) (left) histogram of an i.i.d. normal sample of size

; (right) its MSP plot. (b) similar, but using a resample from the original data set.

Figure 6.30 (a) The nonparametric bootstrap distribution of the

-value based on a random normal sample of size

, using the MSP test and

resamples. Its

-value is marked with the vertical line. (b) Same, but for a different random normal sample.

Figure 6.31 (a) Scatterplot based on 10,000 replications, with

-axis showing the

-value

from the MSP test, using a data set from the null, for

, and

-axis showing the fraction of bootstrap

-values (

), based on that data set, that were less than 0.05. The lines were obtained from quantile regression using regressors a constant,

and

. (b) Similar to the top panel, but the scatterplot corresponds to points obtained using a skew normal alternative with

but the lines are the same as those in the top panel

, that is, correspond to the quantiles under the null.

Figure 6.32 Similar to Figure 6.31 but using the JB test, and only showing the median and 95% quantile fitted lines.

Figure 6.33 (a) Power of the Laplace

test against the normal distribution as a function of

, for

, for three different test sizes (see legend in the bottom panel). (b) Power of

test for Laplace with

against Student's

alternative with

degrees of freedom, for three different test sizes.

Figure 6.34 The mapping between pointwise and simultaneous significance levels, for the Laplace Q-Q test using sample size

, with the actual points obtained from the simulation (circles) and the regression line with intercept, linear, and quadratic term (dashed).

Figure 6.35 (a) The exact and second-order s.p.a. p.d.f. (on the half-line) of

, where

. (b) The p.d.f. (on the half-line) of the standardized sum of 30 independent Laplace r.v.s, using (positive) random values as scale parameters, and the central limit theorem approximation.

Figure 6.36 For the same data set used in Figure 6.2, the top panel shows the normal Q-Q plot using the correct pointwise significance level

to obtain a simultaneous one of 0.05. The bottom uses the value of

determined using the “fast but wrong” method.

Figure 6.37 Normal Q-Q plots with size 0.05 as in Figure 6.36, but using a random sample of 50 observations from a Student's

distribution with 3 degrees of freedom.

Chapter 7

Figure 7.1 Bias (a) and m.s.e. (b) for estimators

(solid),

(dashed) and

(dash-dotted) for sample size

for the model in Example 7.15.

Figure 7.2 Bias (a) and m.s.e. (b) as a function of sample size

for estimators of parameter

in the discrete uniform example, for the m.l.e. (solid), u.m.v.u.e. (dashed), m.m.e. (dash-dotted) and bias adjusted estimator (dotted). The m.s.e. of the u.m.v.u.e. and bias-adjusted estimator are graphically indistinguishable.

Figure 7.3 Variance of

(the u.m.v.u.e. for

) (solid) and the CRlb (dashed), as a function of

Figure 7.4 Bias for the m.l.e. of the geometric distribution parameter for sample size

Figure 7.5 Illustration of how the mean-adjusted estimator is determined. The graph shows the function

for

. If the observed value of

, then, as indicated with arrows in the figure,

Figure 7.6 Based on output from the program in Listing 7.1, this shows the mean and median bias, and the m.s.e., of the five estimators: the m.l.e.

given in (3.25) (denoted MLE in the legend); the mean-bias-adjusted estimator

given in (7.24) (ADJ); the median-unbiased estimator

given in (7.26) (MED); the u.m.v.u.e.

given in (7.25) (UNB); and the mode-adjusted estimator

given in (7.28) (MOD), as a function of

, based on 10,000 replications, for

(left) and

(right).

Figure 7.7 Based on output from the program in Listing 7.1, this shows kernel density estimates of the five estimators: the m.l.e.

given in (3.25) (denoted MLE in the legend); the mean-bias-adjusted estimator

given in (7.24) (ADJ); the median-unbiased estimator

given in (7.26) (MED); the u.m.v.u.e.

given in (7.25) (UNB); and the mode-adjusted estimator

given in (7.28) (MOD), for

(left) and

(right), based on 10,000 replications, for

(top) and

(bottom). The vertical dashed line indicates the true value of

Figure 7.8 Same as Figure 7.6 but with overlaid results, as the new, thicker line, corresponding to the properties of the estimator resulting from taking the value

is less than 0.5, and

otherwise.

Figure 7.9 Same as Figure 7.6 but with overlaid results, as the new, thicker line, corresponding to the estimator (7.31).

Figure 7.10 The bias (a) and m.s.e. (b) of the m.l.e.

(solid), the jackknife estimator

(dashed), and the unbiased estimator

given in (7.25) (dash-dotted), based on sample size

and 50,000 replications. The smoothness of the curves is obtained by using the same seed value when generating the data for each value of

, but this is almost irrelevant given the large number of replications.

Figure 7.11 For Problem 19, this shows the bias (left) and m.s.e. (right) of the m.l.e.

(solid), the jackknife

(dashed) and (7.12) (dash-dotted), as a function of

, based on 2000 replications, for

(top) and

(bottom).

Chapter 8

Figure 8.1 Simulated lengths of 95% c.i.s for

in the

model assuming

unknown.

Figure 8.2 Length of the c.i. (8.4) for parameter

of the location exponential model.

Figure 8.3

Left

: The normal c.d.f.

versus

for

(solid),

(dashed) and

(dash-dotted).

Middle

for

(solid),

(dashed) and

(dash-dotted).

Right

versus

Figure 8.4

Left

: The c.d.f. of

in Example 8.9 for

(solid),

(dashed) and

(dash-dotted).

Right

: Ratio of pivotal method c.i. length to c.d.f. inversion method c.i. length versus

Figure 8.5 The length of the 90% c.i. of

in Example 8.14, with

and

, as a function of

Chapter 9

Figure 9.1 Moment plots for

(top, from left to right) and

(bottom, from left to right) for 2000 simulated i.i.d. Student's

realizations.

Figure 9.2 Moment plots for

(from left to right) for the NASDAQ return series.

Figure 9.3 (a) Estimated values of the degrees-of-freedom parameter for the Student's

distribution, but for data sets that are symmetric stable with tail index 1.6. (b) Estimated values of tail index

for the symmetric stable distribution, but the data are Student's

with

degrees of freedom.

Figure 9.4 Thin, solid lines correspond to the 30 estimated tail index values

for the location–scale asymmetric stable Paretian model, while the thick, empty boxes correspond to the 30 estimated degrees of freedom values

for the (symmetric) location–scale Student's

model.

Figure 9.5 Hill estimates as a function of

, known as Hill plots, for simulated Pareto, symmetric stable Paretian, and Student's

, with tail index for Pareto and stable

, and Student's

degrees of freedom

(a) and

(b), based on sample size

Figure 9.6 Comparison of four estimators of the NCT distribution based on 10,000 replications, two sample sizes, and two parameter constellations. (a) Correspond to

; (b) to

. True parameter values are indicated by vertical dashed lines. The m.l.e.-based distributions are optically almost indistinguishable.

Figure 9.7 Performance comparison via boxplots of the Hint, McCulloch, and ML estimators of tail index

for i.i.d. symmetric stable Paretian data based on sample size

Figure 9.8 Comparison of the small-sample distribution of the McCulloch and maximum likelihood estimators of the parameters of the

model for an i.i.d. data set with

and

, based on values

, and

Figure 9.9 Mean squared error of

for the McCulloch estimator (solid) and m.l.e. (dashed) for

(a) and

(b), for the i.i.d. model with

observations and

distribution. For both McCulloch and the m.l.e., all four parameters are assumed unknown and are estimated.

Figure 9.10 Mean squared error of

for the McCulloch estimator (solid), the Hint estimator (9.7) (dashed), the m.l.e. (dash-dotted), and the method of moments estimator

from Example 5.6 (circles), for

(a) and

(b), for the i.i.d. model with

observations and

distribution. For the m.l.e., maximization was done only with respect to

; parameters

and

were fixed at their known values.

Figure 9.11

First row

: Kernel density, based on 10,000 replications, of the McCulloch estimator (left) and the Kogon and Williams (1998) empirical c.f. estimator (right) of

, for sample size

Second row

: Same, for the m.l.e. of

, but based on only 1000 replications, using the FFT method to calculate the stable density (and, thus, the log-likelihood) (left) and the fast spline-approximation routine for the stable density provided in Nolan's toolbox (function

stableqkpdf

) (right).

Third and fourth rows

: The bottom four panels are the same as the top four, but using

observations.

Figure 9.12 (a) The 90%, 95%, and 99% Wald confidence intervals for

for each of the 30 DJIA stock return series, obtained from having estimated the four-parameter location–scale asymmetric stable distribution. (b) Likelihood ratio test statistics and associated 90%, 95%, and 99% cutoff values.

Figure 9.13 The first boxplot represents the 30 estimated stable Paretian asymmetry parameters,

, for the 30 daily return series on the Dow Jones Industrial Average index, using the McCulloch estimator. The dashed line illustrates their median. Each of the other 19 boxplots is based on the 30 values,

, the

th of which was estimated from a simulated data set of 2020 i.i.d.

values.

Figure 9.14 Values of

, as a function of stable tail index

, based on 10,000 replications, for the

test (9.11) for the two sample sizes

(a) and

(b).

Figure 9.15 Plots associated with the

summability test, based on

, using (a) symmetric stable data with

, (b) Student's

with three degrees of freedom.

Figure 9.16 Boxplots of

, and

based on 1000 simulated symmetric stable data sets, each of length

and for tail index

Figure 9.17 Similar to Figure 9.16 but based on simulated Student's

data with

degrees of freedom (here denoted by df), and using

. MLE refers to the maximum likelihood estimator of stable tail index

Figure 9.18 Boxplots of

, and

under four non-stable-Paretian distributional assumptions, based on 1000 replications, each of length

Figure 9.19 The Hint (thick circle); the m.l.e. estimating all four parameters as unknown (star); the m.l.e. estimating just

, and

, taking

(thin circle); and McCulloch (square) estimates of stable tail index

for each of the 30 DJIA daily stock return series. The lines indicate the interval of

using the four-parameter m.l.e.

Figure 9.20

Top left

: Simulated distribution of the ALHADI test statistic (9.16),

, using 2000 series of i.i.d.

data of length

, where the parameter vector

is the m.l.e. of the daily returns of the AT&T closing stock price, this being the fourth component of the DJIA index.

Top right

: The nonparametric bootstrap distribution of

, using

bootstrap draws from the AT&T return series. The thin vertical line shows the actual value of

for the AT&T returns.

Bottom

: Similar, but using

instead of the ALHADI test statistic.

Figure 9.21

Top

: The ALHADI test statistic for each of the 30 DJIA return series: For each, the left boxplot corresponds to the distribution of the ALHADI test statistic based on simulation of

WILEY SERIES IN PROBABILITY AND STATISTICS

Established by Walter A. Shewhart and Samuel S. Wilks

Editors: David J. Balding, Noel A. C. Cressie, Garrett M. Fitzmaurice, Geof H. Givens, Harvey Goldstein, Geert Molenberghs, David W. Scott, Adrian F. M. Smith, Ruey S. Tsay

Editors Emeriti: J. Stuart Hunter, Iain M. Johnstone, Joseph B. Kadane, Jozef L. Teugels

The Wiley Series in Probability and Statistics is well established and authoritative. It covers many topics of current research interest in both pure and applied statistics and probability theory. Written by leading statisticians and institutions, the titles span both state-of-the-art developments in the field and classical methods.

Reflecting the wide range of current research in statistics, the series encompasses applied, methodological and theoretical statistics, ranging from applications and new techniques made possible by advances in computerized practice to rigorous treatment of theoretical approaches. This series provides essential and invaluable reading for all statisticians, whether in academia, industry, government, or research.

A complete list of titles in this series can be found at http://www.wiley.com/go/wsps

Fundamental Statistical Inference

A Computational Approach

Marc S. Paolella

Department of Banking and Finance University of Zurich Switzerland

Copyright

This edition first published 2018

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.

The right of Marc S. Paolella to be identified as the author of this work has been asserted in accordance with law.

Registered Offices

John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA

John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK

Editorial Office

9600 Garsington Road, Oxford, OX4 2DQ, UK

For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.