80,99 €
This book uses the EM (expectation maximization) algorithm to simultaneously estimate the missing data and unknown parameter(s) associated with a data set. The parameters describe the component distributions of the mixture; the distributions may be continuous or discrete.
The editors provide a complete account of the applications, mathematical structure and statistical analysis of finite mixture distributions along with MCMC computational methods, together with a range of detailed discussions covering the applications of the methods and features chapters from the leading experts on the subject. The applications are drawn from scientific discipline, including biostatistics, computer science, ecology and finance. This area of statistics is important to a range of disciplines, and its methodology attracts interest from researchers in the fields in which it can be applied.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 522
Veröffentlichungsjahr: 2011
Contents
Cover
Wiley Series in Probability and Statistics
Title Page
Copyright
Preface
Acknowledgements
List of contributors
Chapter 1: The EM algorithm, variational approximations and expectation propagation for mixtures
1.1 Preamble
1.2 The EM algorithm
1.3 Variational approximations
1.4 Expectation—propagation
Acknowledgements
References
Chapter 2: Online expectation maximisation
2.1 Introduction
2.2 Model and Assumptions
2.3 The EM Algorithm and the Limiting EM Recursion
2.4 Online expectation maximisation
2.5 Discussion
References
Chapter 3: The limiting distribution of the EM test of the order of a finite mixture
3.1 Introduction
3.2 The method and theory of the EM test
3.3 Proofs
3.4 Discussion
References
Chapter 4: ComparingWald and likelihood regions applied to locally identifiable mixture models
4.1 Introduction
4.2 Background on Likelihood Confidence Regions
4.3 Background on Simulation and Visualisation of the Likelihood Regions
4.4 Comparison Between the Likelihood Regions and the Wald Regions
4.5 Application to a Finite Mixture Model
4.6 Data Analysis
4.7 Discussion
References
Chapter 5: Mixture of experts modelling with social science applications
5.1 Introduction
5.2 Motivating Examples
5.3 Mixture Models
5.4 Mixture of Experts Models
5.5 A Mixture of Experts Model for Ranked Preference Data
5.6 A Mixture of Experts Latent Position Cluster Model
5.7 Discussion
Acknowledgements
References
Chapter 6: Modelling conditional densities using finite smooth mixtures
6.1 Introduction
6.2 The model and prior
6.3 Inference methodology
6.4 Applications
6.5 Conclusions
Acknowledgements
Appendix: Implementation details for the gamma and log-normal models
References
Chapter 7: Nonparametric mixed membership modelling using the IBP compound Dirichlet process
7.1 Introduction
7.2 Mixed Membership Models
7.3 Motivation
7.4 Decorrelating Prevalence and Proportion
7.5 Related Models
7.6 Empirical Studies
7.7 Discussion
References
Chapter 8: Discovering nonbinary hierarchical structures with Bayesian rose trees
8.1 Introduction
8.2 Prior work
8.3 Rose trees, partitions and mixtures
8.4 Avoiding needless cascades
8.5 Greedy construction of Bayesian rose tree mixtures
8.6 Bayesian hierarchical clustering, Dirichlet process models and product partition models
8.7 Results
8.8 Discussion
References
Chapter 9: Mixtures of factor analysers for the analysis of high-dimensional data
9.1 Introduction
9.2 Single-factor analysis model
9.3 Mixtures of factor analysers
9.4 Mixtures of common factor analysers (MCFA)
9.5 Some related approaches
9.6 Fitting of factor-analytic models
9.7 Choice of the number of factors q
9.8 Example
9.9 Low-dimensional plots via MCFA approach
9.10 Multivariate t-factor analysers
9.11 Discussion
9.12 Appendix
References
Chapter 10: Dealing with label switching under model uncertainty
10.1 Introduction
10.2 Labelling through clustering in the point-process representation
10.3 Identifying mixtures when the number of components is unknown
10.4 Overfitting heterogeneity of component-specific parameters
10.5 Concluding remarks
References
Chapter 11: Exact Bayesian analysis of mixtures
11.1 Introduction
11.2 Formal Derivation of the Posterior Distribution
References
Chapter 12: Manifold MCMC for mixtures
12.1 Introduction
12.2 Markov chain Monte Carlo Methods
12.3 Finite Gaussian mixture models
12.4 Experiments
12.5 Discussion
Acknowledgements
Appendix
References
Chapter 13: How many components in a finite mixture?
13.1 Introduction
13.2 The galaxy data
13.3 The normal mixture model
13.4 Bayesian analyses
13.5 Posterior distributions for K (for flat prior)
13.6 Conclusions from the Bayesian analyses
13.7 Posterior distributions of the model deviances
13.8 Asymptotic distributions
13.9 Posterior deviances for the galaxy data
13.10 Conclusions
References
Chapter 14: Bayesian mixture models: a blood-free dissection of a sheep
14.1 Introduction
14.2 Mixture models
14.3 Altering dimensions of the mixture model
14.4 Bayesian mixture model incorporating spatial information
14.5 Volume calculation
14.6 Discussion
References
Index
Wiley Series in Probability and Statistics
Established by Walter A. Shewhart and Samuel S. Wilks
Editors
David J. Balding, Noel A.C. Cressie, Garrett M. Fitzmaurice, Harvey Goldstein, Geert Molenberghs, David W. Scott, Adrian F.M. Smith, Ruey S. Tsay, Sanford Weisberg
Editors Emeriti
Vic Barnett, Ralph A. Bradley, J. Stuart Hunter, J.B. Kadane, David G. Kendall, Jozef L. Teugels
A complete list of the titles in this series can be found on http://www.wiley.com/WileyCDA/Section/id-300611.html.
This edition first published 2011
© 2011 John Wiley & Sons, Ltd
Registered office
John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom
For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com.
The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.
Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought.
Library of Congress Cataloging-in-Publication Data
Mengersen, Kerrie L.
Mixtures: estimation and applications / Kerrie L. Mengersen, Christian P. Robert, D. Michael Titterington.
p. cm.
Includes bibliographical references and index.
ISBN 978-1-119-99389-6 (cloth)
1. Mixture distributions (Probability theory) I. Robert, Christian P., 1961- II. Titterington, D. M.
III. Title.
QA273.6.M46 2011
519.2'4–dc22
2010053469
A catalogue record for this book is available from the British Library.
Print ISBN: 978-1-119-99389-6
ePDF ISBN: 978-1-119-99568-5
oBook ISBN: 978-1-119-99567-8
ePub ISBN: 978-1-119-99844-0
Mobi ISBN: 978-1-119-99845-7
Preface
This edited volume was stimulated by a workshop entitled ‘Mixture Estimation and Applications’ held at the International Centre for Mathematical Science (ICMS) in Edinburgh on 3–5 March 2010. With the exception of the chapters written by the editors, all chapters were presented during this workshop.
Statistical mixture distributions are used to model scenarios in which certain variables are measured but a categorical variable is missing. For example, although clinical data on a patient may be available their disease category may not be, and this adds significant degrees of complication to the statistical analysis. The above situation characterises the simplest mixture–type scenario; variations include, among others, hidden Markov models, in which the missing variable follows a Markov chain model, and latent structure models, in which the missing variable or variables represent model–enriching devices rather than real physical entities. Finally, mixture models can simply be employed as a more flexible parametric or nonparametric description of data, processes and systems. In the title of the workshop the term ‘mixture’ was taken to include these and other variations along with the simple mixture. The motivating factors for this three–day workshop were that research on inference and computational techniques for mixture–type models is currently experiencing major advances and that simultaneously the application of mixture modelling to many fields in science and elsewhere has never been so rich. The workshop thus assembled leading statisticians and computer scientists, in both methodological research and applied inference, at this fertile interface. The methodological component encompassed both Bayesian and non–Bayesian contributions, with biology and economics featuring strongly among the application areas addressed.
In addition to the lectures per se, there were two special lectures, given by Michael Jordan and Kerrie Mengersen. Michael Jordan gave a wide–reaching lecture on ‘Applied Bayesian nonparametrics’ as part of the Edinburgh Informatics Distinguished Lecture series. Kerrie Mengersen presented an evening public lecture at the ICMS on ‘Where are they and what do they look like? Discovering patterns in data using statistical mixture models’. Both lectures were highly successful in attracting large audiences and were very well received by those audiences but they do not appear in this volume.
The workshop itself was attended by 70 participants, who had all contributed to the high quality of both the presentations and the corresponding exchanges. The meeting started with a session dedicated to label–switching, with John Geweke and Sylvia Frühwirth–Schnatter (Chapter 10) presenting their views on this difficult issue and Gilles Celeux, Agostino Nobile and Christian Robert discussing the presentations. Other Bayesian talks included Murray Aitkin's (Chapter 10) views on the estimation of the number of components, Clair Alston's (Chapter 14) application to sheep dissection, Kim–Anh Do's application to translational cancer research, Richard Gerlach's smooth transition mixture GARCH models, Chris Holmes's investigations in variable selection, Robert Kohn's (Chapter 6) Modelling conditional densities using finite smooth mixtures, Peter Müller's semiparametric mixture models with covariate–dependent weights and Michael Newton's gamma–based clustering with application to gene–expression analysis. Work on the asymptotics of mixtures was represented by Jiahua Chen's (Chapter 3) lecture on testing the order of finite mixture models by the EM test and Bruce Lindsay's (Chapter 4) talk on mixture analysis in many dimensions, a topic related to Geoff McLachlan's (Chapter 9) modelling of high–dimensional data. Similarly, there was a wide range of talks at the interface between nonparametric Bayes and machine learning, introduced by Michael Jordan's overview and followed by Mark Girolami's (Chapter 12) on spectral mixture component inference, Katherine Heller's (Chapter 7) IBP compound Dirichlet process, Iain Murray's sampling in latent variable models, Yee Whye Teh's (Chapter 8) presentation on hierarchical clustering and Chris Williams's greedy learning of binary latent trees. Brendan Murphy (Chapter 5) covered the mixture–of–experts model from a clustering perspective. Talks with a more computational emphasis were also presented during the workshop, with Christophe Andrieu's approximations of MCMC algorithms, Olivier Cappé's (Chapter 2) work on online EM and Paul Fearnhead's lecture on sequential Monte Carlo.
We believe that this collection of chapters represents the state of the art in mixture modelling, inference and computation. It is our hope that the compilation of our current understanding of this important field will be useful and profitable to active researchers and practitioners in the field as well as to newcomers.
Kerrie L. Mengersen
Christian P. Robert
D. Michael Titterington
Brisbane, Sceaux and Glasgow, 18 November 2010
Acknowledgements
This book consists of chapters contributed by invited speakers at the meeting ‘Mixture Estimation and Applications’ organised by the International Centre for Mathematical Science (ICMS) in Edinburgh on 3–5 March 2010. The editors are very grateful for the exemplary organisational efficiency of the ICMS staff and for funding provided, through the ICMS, by the UK Engineering and Physical Sciences Research Council, the London Mathematical Society and the Glasgow Mathematical Journal Trust, as well as by the Royal Statistical Society, the Australian Research Council and l'Agence Nationale de la Recherche.
The editors are also most grateful to the contributors for contributing a chapter and for their help during the preparation stage. The support of John Wiley through the encouragement of Kathryn Sharples was most welcome. Part of the editing of the book was done during a visit by the second editor to the Wharton School of Business, University of Pennsylvania, whose support and welcome he gratefully acknowledges.
List of contributors
Murray Aitkin
Department of Mathematics and Statistics
University of Melbourne, Australia
Clair L. Alston
School of Mathematical Sciences
Queensland University of Technology Australia
Jangsun Baek
Department of Statistics
Chonnam National University Gwangju, Korea
David M. Blei
Computer Science Department
Princeton University, New Jersey USA
Charles Blundell
Gatsby Computational Neuroscience Unit
University College London, UK
Olivier Cappé
LTCI, Telecom ParisTech, Paris France
Jiahua Chen
Department of Statistics
University of British Columbia Vancouver, Canada
Sylvia Frühwirth-Schnatter
Department of Applied Statistics and Econometrics
Johannes Kepler Universität Linz Austria
Graham E. Gardner
School of Veterinary and Biomedical Sciences
Murdoch University, Australia
Mark Girolami
Department of Statistical Science
University College London, UK
Isobel Claire Gormley
School of Mathematical Sciences
University College Dublin Ireland
Katherine A. Heller
Department of Engineering
University of Cambridge, UK
Robert Kohn
Australian School of Business
University of New South Wales, Sydney Australia
Daeyoung Kim
Department of Mathematics and Statistics
University of Massachusetts, Amherst Massachusetts, USA
Feng Li
Department of Statistics
Stockholm University, Sweden
Pengfei Li
Department of Mathematical and Statistical Sciences
University of Alberta, Edmonton Canada
Bruce G. Lindsay
Department of Statistics
Pennsylvania State University Pennsylvania, USA
Geoffrey J. McLachlan
Department of Mathematics and Institute for Molecular Bioscience University of Queensland, St Lucia Australia
Kerrie L. Mengersen
School of Mathematical Sciences
Queensland University of Technology Australia
Thomas Brendan Murphy
School of Mathematical Sciences
University College Dublin, Ireland
Suren I. Rathnayake
Department of Mathematics and Institute for Molecular Bioscience University of Queensland, St Lucia Australia
Christian P. Robert
Université Paris-Dauphine CEREMADE, Paris, France
Vassilios Stathopoulos
Department of Statistical Science
University College London, UK
Yee Whye Teh
Gatsby Computational Neuroscience Unit
University College London, UK
D. Michael Titterington
University of Glasgow
Glasgow, UK
Mattias Villani
Sveriges Riksbank
Stockholm, Sweden
Chong Wang
Computer Science Department
Princeton University, New Jersey USA
Sinead Williamson
Department of Engineering
University of Cambridge, UK
Chapter 1
The EM algorithm, variational approximations and expectation propagation for mixtures
D. Michael Titterington
1.1 Preamble
The material in this chapter is largely tutorial in nature. The main goal is to review two types of deterministic approximation, variational approximations and the expectation propagation approach, which have been developed mainly in the computer science literature, but with some statistical antecedents, to assist approximate Bayesian inference. However, we believe that it is helpful to preface discussion of these methods with an elementary reminder of the EM algorithm as a way of computing posterior modes. All three approaches have now been applied to many model types, but we shall just mention them in the context of mixtures, and only a very small number of types of mixture at that.
1.2 The EM algorithm
1.2.1 Introduction to the algorithm
Parameter estimation in mixture models often goes hand-in-hand with a discussion of the EM algorithm. This is especially so if the objective is maximum likelihood estimation, but the algorithm is also relevant in the Bayesian approach if maximum a posteriori estimates are required. If we have a set of data D from a parametric model, with parameter θ, probably multidimensional, and with likelihood function and prior density , then the posterior density for θ is
and therefore the posterior mode is the maximiser of . Of course, if the prior density is uniform then the posterior mode is the same as the maximum likelihood estimate. If explicit formulae for the posterior mode do not exist then recourse has to be made to numerical methods, and the EM algorithm is a popular general method in contexts that involve incomplete data, either explicitly or by construct. Mixture data fall into this category, with the component membership indicators z regarded as missing values.
The EM algorithm is as follows. With data D and initial guess for , a sequence of values are generated from the following double-step that creates from .
E-step: Evaluate M-step: Find to maximise with respect to .Remarks
1. In many other incomplete data problems the missing values z are continuous, in which case the summation is replaced by an integration in the above.
2. Not surprisingly, is usually very like a complete-data log-posterior, apart from a constant that is independent of , so that the M-step is easy or difficult according as calculation of the complete-data posterior mode is easy or difficult.
3. The usual monotonicity proof of the EM algorithm in the maximum-likelihood context can be used, with minimal adaptation, to show that
Thus, the EM algorithm ‘improves’ at each stage and, provided the posterior density for is locally bounded, the values of should converge to a local maximum of . The corresponding sequence will also often converge, one hopes to , but convergence may not occur if, for instance, contains a ridge. The niceties of convergence properties are discussed in detail, for maximum likelihood, in Chapter 3 of McLachlan and Krishnan 1997.
1.2.2 The E-step and the M-step for the mixing weights
Suppose now that the data are a random sample from a distribution with probability density function
where are the mixing weights, the are the component densities, each corresponding to a subpopulation, and k is finite. The density is parameterised by and the set of all these is to be called ϕ. Often we shall assume that the component densities are of the same type, in which case we shall omit the subscript j from . The complete set of parameters is .
The complete-data joint distribution can be conveniently written as
with the help of the indicator notation, where if the ith observation comes from component j and is zero otherwise. Thus
For the E-step of the EM algorithm all that we have to compute are the expectations of the indicator variables. Given , we obtain
for each i, j. We now have
say, where , a ‘pseudo’ sample size associated with subpopulation j. (In the case of complete data, is precisely the sample size for subpopulation j.)
We now consider the M-step for the mixing weights λ. Before this we make some assumptions about the prior distributions. In particular, we assume that λ and ϕ are a priori independent and that the prior for λ takes a convenient form, namely that which would be conjugate were the data complete. Thus, we assume that, a priori, that is,
for prescribed hyperparameters . Clearly, must maximise
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
