Nonparametric Tests for Complete Data - Vilijandas Bagdonavicius - E-Book

Nonparametric Tests for Complete Data E-Book

Vilijandas Bagdonavicius

0,0
149,99 €

oder
-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Statistical analysis of data sets usually involves construction of a statistical model of the distribution of data within the available sample – and by extension the distribution of all data of the same category in the world. Statistical models are either parametric or non-parametric – this distinction is based on whether or not the model can be described in terms of a finite-dimensional parameter – and the models must be tested to ascertain whether or not they conform to the data, or are accurate.

This book addresses the testing of hypotheses in non-parametric models in the general case for complete data samples. Classical non-parametric tests (goodness-of-fit, homogeneity, randomness, independence) of complete data are considered, and explained. Tests featured include the chi-squared and modified chi-squared tests, rank and homogeneity tests, and most of the test results are proved, with real applications illustrated using examples. The incorrect use of many tests, and their application using commonly deployed statistical software is highlighted and discussed.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 281

Veröffentlichungsjahr: 2013

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Preface

Terms and Notation

Chapter 1. Introduction

1.1. Statistical hypotheses

1.2. Examples of hypotheses in non-parametric models

1.3. Statistical tests

1.4. P-value

1.5. Continuity correction

1.6. Asymptotic relative efficiency

Chapter 2. Chi-squared Tests

2.1. Introduction

2.2. Pearson’s goodness-of-fit test: simple hypothesis.

2.3. Pearson’s goodness-of-fit test: composite hypothesis

2.4. Modified chi-squared test for composite hypotheses

2.5. Chi-squared test for independence

2.6. Chi-squared test for homogeneity

2.7. Bibliographic notes

2.8. Exercises

2.9. Answers

Chapter 3. Goodness-of-fit Tests Based on Empirical Processes

3.1. Test statistics based on the empirical process

3.2. Kolmogorov–Smirnov test

3.3. ω2, Cramér–von-Mises and Andersen–Darling tests

3.4. Modifications of Kolmogorov–Smirnov, Cramér–von-Mises and Andersen–Darling tests: composite hypotheses

3.5. Two-sample tests

3.6. Bibliographic notes

3.7. Exercises

3.8. Answers

Chapter 4. Rank Tests

4.1. Introduction

4.2. Ranks and their properties

4.3. Rank tests for independence

4.4. Randomness tests

4.5. Rank homogeneity tests for two independent samples

4.6. Hypothesis on median value: the Wilcoxon signed ranks test

4.7. Wilcoxon’s signed ranks test for homogeneity of two related samples

4.8. Test for homogeneity of several independent samples: Kruskal–Wallis test

4.9. Homogeneity hypotheses for k related samples: Friedman test

4.10. Independence test based on Kendall’s concordance coefficient

4.11. Bibliographic notes

4.12. Exercises

4.13. Answers

Chapter 5. Other Non-parametric Tests

5.1. Sign test

5.2. Runs test

5.3. McNemar’s test

5.4. Cochran test

5.5. Special goodness-of-fit tests

5.6. Bibliographic notes

5.7. Exercises

5.8. Answers

APPENDICES

Appendix A. Parametric Maximum Likelihood Estimators: Complete Samples

Appendix B. Notions from the Theory of Stochastic Processes

B.1. Stochastic process

B.2. Examples of stochastic processes

B.3. Weak convergence of stochastic processes

B.4. Weak invariance of empirical processes

B.5. Properties of Brownian motion and Brownian bridge

Bibliography

Index

First published 2011 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.

Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address:

ISTE Ltd

John Wiley & Sons, Inc.

27-37 St George’s Road

111 River Street

London SW19 4EU

Hoboken, NJ 07030

UK

USA

www.iste.co.uk

www.wiley.com

© ISTE Ltd 2011

The rights of Vilijandas Bagdonaviçius, Julius Kruopis and Mikhail S. Nikulin to be identified as the authors of this work have been asserted by them in accordance with the Copyright, Designs and Patents Act 1988.

Library of Congress Cataloging-in-Publication Data

Bagdonavicius, V. (Vilijandas)

  Nonparametric tests for complete data / Vilijandas Bagdonavicius, Julius Kruopis, Mikhail Nikulin.

     p. cm.

  Includes bibliographical references and index.

  ISBN 978-1-84821-269-5 (hardback)

  1. Nonparametric statistics. 2. Statistical hypothesis testing. I. Kruopis, Julius. II. Nikulin, Mikhail (Mikhail S.) III. Title.

  QA278.8.B34 2010

  519.5--dc22

2010038271

British Library Cataloguing-in-Publication Data

A CIP record for this book is available from the British Library

ISBN 978-1-84821-269-5

Preface

Testing hypotheses in non-parametric models are discussed in this book. A statistical model is non-parametric if it cannot be written in terms of a finite-dimensional parameter. The main hypotheses tested in such models are hypotheses on the probability distribution of elements of the following: data homogeneity, randomness and independence hypotheses. Tests for such hypotheses from complete samples are considered in many books on non-parametric statistics, including recent monographs by Maritz [MAR 95], Hollander and Wolfe [HOL 99], Sprent and Smeeton [SPR 01], Govindarajulu [GOV 07], Gibbons and Chakraborti [GIB 09] and Corder and Foreman [COR 09].

This book contains tests from complete samples. Tests for censored samples can be found in our book Tests for Censored Samples [BAG 11].

In Chapter 1, the basic ideas of hypothesis testing and general hypotheses on non-parametric models are briefly described.

In the initial phase of the solution of any statistical problem the analyst must choose a model for data analysis. The correctness of the data analysis strongly depends on the choice of an appropriate model. Goodness-of-fit tests are used to check the adequacy of a model for real data.

One of the most-applied goodness-of-fit tests are chi-squared type tests, which use grouped data. In many books on statistical data analysis, chi-squared tests are applied incorrectly. Classical chi-squared tests are based on theoretical results which are obtained assuming that the ends of grouping intervals do not depend on the sample, and the parameters are estimated using grouped data. In real applications, these assumptions are often forgotten. The modified chi-squared tests considered in Chapter 2 do not suffer from such drawbacks. They are based on the assumption that the ends of grouping intervals depend on the data, and the parameters are estimated using initially non-grouped data.

Another class of goodness-of-fit tests based on functionals of the difference of empirical and theoretical cumulative distribution functions is described in Chapter 3. The tests for composite hypotheses classical statistics are modified by replacing unknown parameters by their estimators. Application of these tests is often incorrect because the critical values of the classical tests are used in testing the composite hypothesis and applying modified statistics.

In section 5.5, special goodness-of-fit tests which are not from the two above-mentioned classes, and which are specially designed for specified probability distributions, are given.

Tests for the equality of probability distributions (homogeneity tests) of two or more independent or dependent random variables are considered in several chapters. Chi-squared type tests are given in section 2.5 and tests based on functionals of the difference of empirical distribution functions are given in section 3.5. For many alternatives, the most efficient tests are the rank tests for homogeneity given in sections 4.4 and 4.6–4.8.

Classical tests for the independence of random variables are given in sections 2.4 (tests of chi-square type), and 4.3 and 5.2 (rank tests).

Tests for data randomness are given in sections 4.3 and 5.2.

All tests are described in the following way: 1) a hypothesis is formulated; 2) the idea of test construction is given; 3) a statistic on which a test is based is given; 4) a finite sample and (or) asymptotic distribution of the test statistic is found; 5) a test, and often its modifications (continuity correction, data with ex aequo, various approximations of asymptotic law) are given; 6) practical examples of application of the tests are given; and 7) at the end of the chapters problems with answers are given.

Anyone who uses non-parametric methods of mathematical statistics, or wants to know the ideas behind and mathematical substantiation of the tests, can use this book. It can be used as a textbook for a one-semester course on non-parametric hypotheses testing.

Knowledge of probability and parametric statistics are needed to follow the mathematical developments. The basic facts on probability and parametric statistics used in the the book are also given in the appendices.

The book consists of five chapters, and appendices. In each chapter, the numbering of theorems, formulas, and comments are given using the chapter number.

The book was written using lecture notes for graduate students in Vilnius and Bordeaux universities.

We thank our colleagues and students at Vilnius and Bordeaux universities for comments on the content of this book, especially Rta Levulienė for writing the computer programs needed for application of the tests and solutions of all the exercises.

Vilijandas BAGDONAVIČIUS

Julius KRUOPIS

Mikhail NIKULIN

Terms and Notation

A > B (A ≥ B) – the matrix A − B is positive (non-negative) definite;

a ∨ b (a ∧ b) – the maximum (the minimum) of the numbers a and b;

ASE – the asymptotic relative efficiency;

B(n, p) – binomial distribution with parameters n and p;

B−(n, p) – negative binomial distribution with parameters n and p;

Be(γ, η) – beta distribution with parameters γ and η;

cdf – the cumulative distribution function;

CLT – the central limit theorem;

Cov(X, Y) – the covariance of random variables X and Y;

Cov(X, Y) – the covariance matrix of random vectors X and Y;

EX – the mean of a random variable X;

E(X) – the mean of a random vector X;

Eθ(X), E(X|θ), Varθ(X), Var(X|θ) – the mean or the variance of a random variable X depending on the parameter θ;

ε(λ) – exponential distribution with parameters λ;

F(m, n) – Fisher distribution with m and n degrees of freedom;

F(m, n; δ) – non-central Fisher distribution with m and n degrees of freedom and non-centrality parameter δ;

Fα(m, n) – α critical value of Fisher distribution with m and n degrees of freedom;

FT(x) (fT(x)) – the cdf (the pdf) of the random variable T;

f(x; θ), f(x|θ) – the pdf depending on a parameter θ;

F(x; θ), F(x|θ) – the cdf depending on a parameter θ;

G(λ, η) – gamma distribution with parameters λ and η;

iid – independent identically distributed;

LN(μ, σ) – lognormal distribution with parameters μ and σ;

LS – least-squares (method, estimator);

ML – maximum likelihood (function, method, estimator);

N(0, 1) – standard normal distribution;

N(μ, σ2) – normal distribution with parameters μ and σ2;

Nk(μ, ∑) – k-dimensional normal distribution with the mean vector μ and the covariance matrix ∑;

– Poisson distribution with a parameter λ;

pdf– the probability density function;

P{A} – the probability of an event A;

P{A|B} – the conditional probability of event A;

Pθ{A}, P{A|θ} – the probability depending on a parameter θ;

rv – random variable

S(n) – Student’s distribution with n degrees of freedom;

S(n; δ) – non-central Student’s distribution with n degrees of freedom and non-centrality parameter δ;

tα(n) – α critical value of Student’s distribution with n degrees of freedom;

U(α, β) – uniform distribution in the interval (α, β);

UMP – uniformly most powerful (test);

UUMP – unbiased uniformly most powerful (test);

VarX – the variance of a random variable X;

Var(X) – the covariance matrix of a random vector X;

W(θ, ν) – Weibull distribution with parameters θ ir ν;

X, Y, Z,… – random variables;

X, Y, Z,… – random vectors;

XT – the transposed vector, i.e. a vector-line;

X ~ N(μ, σ2) – random variable X normally distributed with parameters μ and σ2 (analogously in the case of other distributions);

– convergence in probability (n → ∞);

– almost sure convergence or convergence with probability 1 (n → ∞);

– weak convergence or convergence in distribution (n → ∞);

– random variables Xn asymptotically (n → ∞) normally distributed with parameters μ and σ2;

Xn ~ Yn– random variables Xn and Yn asymptotically (n → ∞) equivalent

x(P) – P-th quantile;

xP – P-th critical value;

zα – α critical value of the standard normal distribution;

– covariance matrix;

χ2(n) – chi-squared distribution with n degrees of freedom;

χ2(n; δ) – non-central chi-squared distribution with n degrees of freedom and non-centrality parameter δ;

– α critical value of chi-squared distribution with n degrees of freedom.

Chapter 1

Introduction

1.1. Statistical hypotheses

In more complicated experiments the elements Xi are dependent, or not identically distributed, or are themselves random vectors. The random vector X is then called a sample, not a simple sample.

Suppose that the cumulative distribution function (cdf) F of a sample X (or of any element Xi of a simple sample) belongs to a set of cumulative distribution functions. For example, if the sample is simple then may be the set of absolutely continuous, discrete, symmetric, normal, Poisson cumulative distribution functions. The set defines a statistical model.

Suppose that is a subset of .

The statistical hypothesis H0 is the following assertion: the cumulative distribution function F belongs to the set. We write .

The hypothesis , where is the complement of to is called alternative to the hypothesis H0.

If is defined by a finite-dimensional parameter θ then the model is parametric. In this case the statistical hypothesis is a statement on the values of the finite-dimensional parameter θ.

In this book non-parametric models are considered. A statistical model is called non-parametric if is not defined by a finite-dimensional parameter.

If the set contains only one element of the set then the hypothesis is simple, otherwise the hypothesis is composite.

1.2. Examples of hypotheses in non-parametric models

Let us look briefly and informally at examples of the hypotheses which will be considered in the book. We do not formulate concrete alternatives, only suppose that models are non-parametric. Concrete alternatives will be formulated in the chapters on specified hypotheses.

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!