80,99 €
The Dirichlet distribution appears in many areas of application, which include modelling of compositional data, Bayesian analysis, statistical genetics, and nonparametric inference. This book provides a comprehensive review of the Dirichlet distribution and two extended versions, the Grouped Dirichlet Distribution (GDD) and the Nested Dirichlet Distribution (NDD), arising from likelihood and Bayesian analysis of incomplete categorical data and survey data with non-response.
The theoretical properties and applications are also reviewed in detail for other related distributions, such as the inverted Dirichlet distribution, Dirichlet-multinomial distribution, the truncated Dirichlet distribution, the generalized Dirichlet distribution, Hyper-Dirichlet distribution, scaled Dirichlet distribution, mixed Dirichlet distribution, Liouville distribution, and the generalized Liouville distribution.
Key Features:
Practitioners and researchers working in areas such as medical science, biological science and social science will benefit from this book.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 294
Veröffentlichungsjahr: 2011
Contents
Cover
Wiley Series in Probability and Statistics
Title Page
Copyright
Dedication
Preface
Acknowledgments
List of abbreviations
List of symbols
List of figures
List of tables
Chapter 1: Introduction
1.1 Motivating Examples
1.2 Stochastic Representation and the Operator
1.3 Beta and Inverted Beta Distributions
1.4 Some Useful Identities and Integral Formulae
1.5 The Newton–Raphson Algorithm
1.6 Likelihood in Missing-Data Problems
1.7 Bayesian MDPs and Inversion of Bayes' Formula
1.8 Basic statistical distributions
Chapter 2: Dirichlet distribution
2.1 Definition and Basic Properties
2.2 Marginal and Conditional Distributions
2.3 Survival Function and Cumulative Distribution Function
2.4 Characteristic Functions
2.5 Distribution for Linear Function of a Dirichlet Random Vector
2.6 Characterizations
2.7 MLEs of the Dirichlet Parameters
2.8 Generalized Method of Moments Estimation
2.9 Estimation Based on Linear Models
2.10 Application in Estimating ROC Area
Chapter 3: Grouped Dirichlet distribution
3.1 Three Motivating Examples
3.2 Density Function
3.3 Basic Properties
3.4 Marginal Distributions
3.5 Conditional Distributions
3.6 Extension to Multiple Partitions
3.7 Statistical Inferences: Likelihood Function with GDD Form
3.8 Statistical Inferences: Likelihood Function Beyond GDD Form
3.9 Applications Under Nonignorable Missing Data Mechanism
Chapter 4: Nested Dirichlet distribution
4.1 Density function
4.2 Two motivating examples
4.3 Stochastic representation, mixed moments, and mode
4.4 Marginal distributions
4.5 Conditional distributions
4.6 Connection with exact null distribution for sphericity test
4.7 Large-Sample Likelihood Inference
4.8 Small-Sample Bayesian Inference
4.9 Applications
4.10 A Brief Historical Review
Chapter 5: Inverted Dirichlet distribution
5.1 Definition Through the Density Function
5.2 Definition Through Stochastic Representation
5.3 Marginal and Conditional Distributions
5.4 Cumulative Distribution Function and Survival Function
5.5 Characteristic Function
5.6 Distribution for Linear Function of Inverted Dirichlet Vector
5.7 Connection with Other Multivariate Distributions
5.8 Applications
Chapter 6: Dirichlet–multinomial distribution
6.1 Probability Mass Function
6.2 Moments of the Distribution
6.3 Marginal and Conditional Distributions
6.4 Conditional Sampling Method
6.5 The Method of Moments Estimation
6.6 The Method of Maximum Likelihood Estimation
6.7 Applications
6.8 Testing the multinomial assumption against the Dirichlet—multinomial alternative
Chapter 7: Truncated Dirichlet distribution
7.1 Density Function
7.2 Motivating Examples
7.3 Conditional Sampling Method
7.4 Gibbs Sampling Method
7.5 The Constrained Maximum Likelihood Estimates
7.6 Application to Misclassification
7.7 Application to Uniform Design of Experiment with Mixtures
Chapter 8: Other related distributions
8.1 The generalized Dirichlet distribution
8.2 The hyper-Dirichlet distribution
8.3 The scaled Dirichlet distribution
8.4 The mixed Dirichlet distribution
8.5 The Liouville distribution
8.6 The Generalized Liouville Distribution
Appendix A: Some useful S-plus Codes
A.1 Multinomial Distribution
A.2 Dirichlet Distribution
A.3 Grouped Dirichlet Distribution
A.4 Nested Dirichlet Distribution
A.5 Dirichlet–Multinomial Distribution
References
Author Index
Subject Index
Wiley Series in Probability and Statistics
Established by WALTER A. SHEWHART and SAMUEL S. WILKS
Editors
David J. Balding, Noel A.C. Cressie, Garrett M. Fitzmaurice, Harvey Goldstein, Geert Molenberghs, David W. Scott, Adrian F.M. Smith, Ruey S. Tsay, Sanford Weisberg
Editors Emeriti
Vic Barnett, Ralph A. Bradley, J. Stuart Hunter, J.B. Kadane, David G. Kendall, Jozef L. Teugels
A complete list of the titles in this series can be found on http://www.wiley.com/WileyCDA/Section/id-300611.html.
This edition first published 2011
© 2011 John Wiley & Sons, Ltd
Registered office
John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom
For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com.
The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.
Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought.
Library of Congress Cataloging-in-Publication Data
Ng, Kai Wang, author.
Dirichlet and Related Distributions : Theory, Methods and Applications / Kai Wang Ng, Guo-Liang Tian, Man-Lai Tang.
p. cm. – (Wiley Series in Probability and Statistics)
Includes bibliographical references and index.
ISBN 978-0-470-68819-9 (hardback)
1. Distribution (Probability theory) 2. Dirichlet problem. I. Tian, Guo-Liang, author. II. Tang, Man-Lai, author. III. Title.
QA276.7.N53 2011
519.2$4–dc22
2010053428
A catalogue record for this book is available from the British Library.
Print ISBN: 978-0-470-68819-9
ePDF ISBN: 978-1-119-99586-9
oBook ISBN: 978-1-119-99578-4
ePub ISBN: 978-1-119-99841-9
Mobi ISBN: 978-1-119-99842-6
To May, Jeanne, Jason and his family
To Yanli, Margaret and Adam
To Daisy and Beatrice
Preface
The Dirichlet distribution is one of the primary multivariate distributions, which is limited to the simplex of a multidimensional space. It appears in many applications, including modeling of compositional data, Bayesian analysis, statistical genetics, nonparametric inference, distribution-free tolerance intervals, multivariate analysis, order statistics, reliability, probability inequalities, probabilistic constrained programming models, limit laws, delivery problems, stochastic processes, and other areas. Development in these areas requires extensions of the Dirichlet distribution in different directions to suit different purposes. In particular, the grouped Dirichlet distribution (GDD) and the nested Dirichlet distribution (NDD) have been introduced as new tools for statistical analysis of incomplete categorical data. Articles regarding these generalizations, their properties, and applications are presently scattered in literature. The book is an exposition (i) to systematically review these results and their underlying relationships, (ii) to delineate methods for generating random vectors following these new distributions, and (iii) to show some of their important applications in practice, such as incomplete categorical data analyses.
Chapter 1 introduces the topic with motivating examples and summarizes the concepts and necessary tools for the book. As missing data is an important impetus for generalizing the Dirichlet distribution, we discuss at length the handling of the likelihood function in such problems and the Bayesian approach. It is pointed out that the aim of the Bayesian missing-data problems amounts to the inversion of Bayes' formula (IBF), the density form of the converse of Bayes' theorem. Chapter 2 provides a comprehensive review on the Dirichlet distribution, including its basic properties, marginal and conditional distributions, survival function and cumulative distribution function, characteristic function, distribution for linear function of Dirichlet random vector, characterizations, maximum likelihood estimates of the Dirichlet parameters, moments estimation, generalized moments estimation, estimation based on linear models, and application in estimating receiver operating characteristic area. Most recent materials include: the survival function (Section 2.3.1), characteristic functions for two uniform distributions over the hyperplane (Section 2.4.1) and simplex (Section 2.4.2); the distribution for the linear function of a Dirichlet random vector (Section 2.5); estimation via the expectation–maximization gradient algorithm (Sections 2.7.2 and 2.7.3); and application (Section 2.10). Chapters 3 and 4 review two new families of distributions (GDD and NDD), which are two generalizations of the traditional Dirichlet distribution with extra parameters. More importantly, we emphasize their applications in incomplete categorical data and survey data with nonresponse. Chapter 5 collects theoretical results on the inverted Dirichlet distribution and its applications previously scattered throughout the literature. Chapters 6 and 7 gather the results about the Dirichlet-multinomial distribution and the truncated Dirichlet distribution that are available only in the literature. Chapter 8 reviews the generalized Dirichlet distribution, hyper-Dirichlet distribution, scaled Dirichlet distribution, mixed Dirichlet distribution, Liouville distribution, and the generalized Liouville distribution.
The book is intended as a graduate-level textbook for theoretic statistics, applied statistics, and biostatistics. For example, it can be used as a reference and handbook for researchers and teachers in institutes and universities. The book is also useful, in part at least, to undergraduates in statistics and to practitioners and investigators in survey companies. The knowledge of standard probability and statistics and some topics that are taught in multivariate analysis are necessary for the understanding of this book. Some results in the book are new and have not been published previously.
This book includes an accompanying website. Please visit www.wiley.com/go/dirichlet for more information.
Kai Wang Ng and Guo-Liang TianDepartment of Statistics and Actuarial ScienceThe University of Hong KongPokfulam Road, Hong Kong, P. R. China
Man-Lai TangDepartment of MathematicsHong Kong Baptist UniversityKowloon Tong, Hong Kong, P. R. China
Acknowledgments
From Kai Wang Ng: I have enjoyed very much working with Guo-Liang and Man-Lai; particularly Guo-Liang with whom this is our second co-authored book. I could not have fulfilled my commitment in finishing the project without the patience and support by my family. The staff in the Department of Statistics and Actuarial Science have been very helpful in providing technical support.
From Guo-Liang Tian: I am very grateful for my family's understanding that I work so far away from them, only seeing them one or two months each year. I hope the fruits of this second co-authored book of mine are worthy of their understanding. I am grateful to Professor Ming T. Tan and Dr Hong-Bin Fang of the University of Maryland Greenebaum Cancer Center, who have invited me for several visits. I enjoy working with my co-authors. I appreciate very much my long collaboration with Kai since 1998, especially on missing-data problems and the related applications of inversion of Bayes' formula. Working with Kai always give me the opportunity of learning a lot.
From Man-Lai Tang: Albeit too technical for my wife and my baby daughter, this book is a very special gift to the three of us. I would also like to thank my parents' blessings for my free following of ambitions throughout my childhood, and my brother and sister for taking care of my parents when I pursued my PhD degree at UCLA and my career in the USA. It is my honor to co-author this interesting book with Kai and Guo-Liang who opened the door to me on the subject.
List of Abbreviations
a.s.almost surelyAUCarea under ROC curvecdfcumulative distribution functioncfcharacteristic functioncf.conferCIconfidence/credible intervalCPUcentral process unitDAdata augmentationDOIdegree of infiltrationEMexpectation–maximizationECMexpectation/conditional maximizationE-stepexpectation stepFPFfalse positive fractionGDDgrouped Dirichlet distributionGMMgeneralized method of momentsHPDhighest posterior densityIBFinversion of Bayes' formulaiffif and only ifi.i.d.independently and identically distributedI-stepimputation stepLORlocal odds ratioMARmissing at randomMCARmissing completely at randomMCMCMarkov chain Monte CarloMDPmissing data problemsmgfmoment-generating functionMLEmaximum likelihood estimateM-stepmaximization stepNDDnested Dirichlet distributionpdfprobability density functionpmfprobability mass functionPIEMpartial imputation expectation-maximizationP-stepposterior stepROCreceiver operating characteristicSEstandard errorstdstandard deviationTPFtrue positive fractionList of Symbols
Mathematics
Probability
Statistics
List of tables
1.1 1.1 Relative frequencies of serum proteins in white Pekin ducklings.
1.2 Mammogram data using a five-category scale.
1.3 Cervical cancer data.
1.4 Leprosy survey data.
1.5 Neurological complication data.
1.6 Victimization results from the national crime survey.
1.7 Observed cell counts of the failure times for radio transmitter receivers.
1.8 Ultrasound rating data for detection of breast cancer metastasis.
1.9 Observed forest pollen data from the Bellas Artes core.
1.10 Observed teratogenesis data from exposure to hydroxyurea.
1.11 The joint and marginal distributions.
3.1 Frequentist and Bayesian estimates of parameters for the leprosy survey data.
3.2 Observed counts and cell probabilities for incomplete r × c table.
3.3 Maternal smoking cross-classified by child's wheezing status.
3.4 MLEs and CIs of parameters for child's wheeze data.
3.5 Parameter structure under nonignorable missing mechanism.
3.6 Posterior means and standard deviations for the crime survey data under nonignorable missing mechanism.
4.1r × 2 contingency table with missing data.
4.2 MLEs, SEs and Bayesian estimates of parameters for the simulated data.
4.3 The values of .
4.4 MLEs and Bayesian estimates of gi(tj) for the failure data of radio transmitter receivers.
4.5 Bayesian estimates of AUC for the ultrasound rating data.
7.1 Bayesian estimates of p and π for various prior parameters.
7.2 Frequentist estimates of p for four different cases.
7.3 Bayesian estimates of p for two different cases.
8.1 Posterior means and standard deviations for the crime survey data under the ignorable missing mechanism.
8.2 Results of 88 chess matches among three players.
8.3 First five results of a sports league comprising nine players.
8.4 Results from doubles and singles tennis matches.
Chapter 1
Introduction
As the multivariate version of the beta distribution, the Dirichlet distribution is one of the key multivariate distributions for random vectors confined to the simplex. In the early studies of compositional data, or measurements of proportions, it is the most natural distribution for modeling. In Bayesian statistics, it is the conjugate prior distribution for counts following multinomial distribution. Applications of the Dirichlet distribution in other areas consist of a long list: statistical genetics, nonparametric inference, distribution-free tolerance intervals, order statistics, reliability theory, probability inequalities, probabilistic constrained programming models, limit laws, delivery problems, stochastic processes, and so on.
The Dirichlet family, however, is not rich enough to represent many important applications. Extensions of the Dirichlet distribution in different directions are necessary for different purposes. Previous generalizations and their properties and applications are scattered in the literature. For instance, motivated by the likelihood functions of several incomplete categorical data, the authors of this book and their collaborators have recently developed flexible parametric classes of distributions beyond the Dirichlet distribution. Amongst them, the grouped Dirichlet distribution (GDD) and the nested Dirichlet distribution (NDD) are perhaps the most important. In fact, the GDD and NDD can be used as new tools for statistical analysis of incomplete categorical data. In addition, the inverted Dirichlet, Dirichlet-multinomial, truncated Dirichlet, generalized Dirichlet, hyper-Dirichlet, scaled Dirichlet, mixed Dirichlet, Liouville, and generalized Liouville distributions have a close relation with the Dirichlet distribution. In this book, we shall systematically review these distributions and their underlying properties, including methods for generating these distributions. We also present some of their important applications in practice, such as incomplete categorical data analyses.
1.1 Motivating Examples
The following real data sets help to motivate the Dirichlet distribution and related distributions which will be studied in this book.
Example 1.1Serum-Protein Data of White Pekin Ducklings. Mosimann (1962) reported the blood serum proportions (pre-albumin, albumin, and globulin) in 3-week-old white Pekin ducklings; see Table 1.1. In each of the 23 sets of data, the three proportions are based on 7 to 12 white Pekin ducklings, while the individual measurements are not available. Ducklings in each set had the same diet, but the diet was different from set to set.
Table 1.1 Relative frequencies of serum proteins in white Pekin ducklings.a
In Chapter 2 we will use the data to illustrate the Newton–Raphson algorithm and the expectation–maximization (EM) gradient algorithm for calculating the maximum likelihood estimates (MLEs) of Dirichlet parameters and the associated standard errors (SEs).
Example 1.2(Mammogram Data Data for Patients With/Without Breast Cancer). Mammography is used for the early detection of breast cancer, typically by detecting the characteristic masses and/or microcalcifications by means of low-dose amplitude X-rays in examining women's breast. As a diagnostic and a screening tool, the result is a five-point rating scale: normal, benign, probably benign, suspicious, and malignant. summarizes the mammographer's results (Zhou ., 2002: 20–21) of 60 patients presenting for breast cancer screening. The sample consists of 30 patients with pathology-proven cancer and 30 patients with normal mammograms for two consecutive years.
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
