139,99 €
Cluster or co-cluster analyses are important tools in a variety of scientific areas. The introduction of this book presents a state of the art of already well-established, as well as more recent methods of co-clustering. The authors mainly deal with the two-mode partitioning under different approaches, but pay particular attention to a probabilistic approach.
Chapter 1 concerns clustering in general and the model-based clustering in particular. The authors briefly review the classical clustering methods and focus on the mixture model. They present and discuss the use of different mixtures adapted to different types of data. The algorithms used are described and related works with different classical methods are presented and commented upon. This chapter is useful in tackling the problem of co-clustering under the mixture approach.
Chapter 2 is devoted to the latent block model proposed in the mixture approach context. The authors discuss this model in detail and present its interest regarding co-clustering. Various algorithms are presented in a general context.
Chapter 3 focuses on binary and categorical data. It presents, in detail, the appropriated latent block mixture models. Variants of these models and algorithms are presented and illustrated using examples.
Chapter 4 focuses on contingency data. Mutual information, phi-squared and model-based co-clustering are studied. Models, algorithms and connections among different approaches are described and illustrated.
Chapter 5 presents the case of continuous data. In the same way, the different approaches used in the previous chapters are extended to this situation.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 237
Veröffentlichungsjahr: 2013
Table of Contents
Acknowledgment
Introduction
I.1. Types and representation of data
I.2. Simultaneous analysis
I.3. Notation
I.4. Different approaches
I.5. Model-based co-clustering
I.6. Outline
Chapter 1: Cluster Analysis
1.1. Introduction
1.2. Miscellaneous clustering methods
1.3. Model-based clustering and the mixture model
1.4. EM algorithm
1.5. Clustering and the mixture model
1.6. Gaussian mixture model
1.7. Binary data
1.8. Categorical variables
1.9. Contingency tables
1.10. Implementation
1.11. Conclusion
Chapter 2: Model-Based Co-Clustering
2.1. Metric approach
2.2. Probabilistic models
2.3. Latent block model
2.4. Maximum likelihood estimation and algorithms
2.5. Bayesian approach
2.6. Conclusion and miscellaneous developments
Chapter 3: Co-Clustering of Binary and Categorical Data
3.1. Example and notation
3.2. Metric approach
3.3. Bernoulli latent block model and algorithms
3.4. Parsimonious Bernoulli LBMs
3.5. Categorical data
3.6. Bayesian inference
3.7. Model selection
3.8. Illustrative experiments
3.9. Conclusion
Chapter 4: Co-Clustering of Contingency Tables
4.1. Measures of association
4.2. Contingency table associated with a coupleof partitions
4.3. Co-clustering of contingency table
4.4. Model-based co-clustering
4.5. Comparison of all algorithms
4.6. Conclusion
Chapter 5: Co-Clustering of Continuous Data
5.1. Metric approach
5.2. Gaussian latent block model
5.3. Illustrative example
5.4. Gaussian block mixture model
5.5. Numerical experiments
5.6. Conclusion
Bibliography
Index
First published 2014 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address:
ISTE Ltd27-37 St George’s Road London SW19 4EU UK
www.iste.co.uk
John Wiley & Sons, Inc.111 River Street Hoboken, NJ 07030USA
www.wiley.com
© ISTE Ltd 2014
The rights of Gérard Govaert and Mohamed Nadif to be identified as the author of this work have been asserted by them in accordance with the Copyright, Designs and Patents Act 1988.
Library of Congress Control Number: 2013950131
British Library Cataloguing-in-Publication Data A CIP record for this book is available from the British Library ISBN: 978-1-84821-473-6
Acknowledgment
This research was supported by the CLasSel ANR project ANR-08-EMER-002.
Introduction
The type of a variable is determined by the set of possible values that the variable can take. In the following, we briefly review each type.
Figure I.1.Example of binary data
Binary data have been treated in clustering with a large number of distances, most of which are defined using the values n11, n10, n01 and n00 of the table crossing the two variables. For example, the distances between two binary vectors i and i’ measured using “Jaccarďs index” and the “agreement coefficient” can be written, respectively
Categorical variables, sometimes known as qualitative variables or factors, are a generalization of binary data to situations where there are more than two possible values. Here, each variable may take an arbitrary finite set of values, usually referred to as categories, modalities or levels. Like binary data, categorical data may be represented in different ways: as a table of individuals–variables of dimension (n,d), as a frequency vector for the different possible states, as a contingency table with d dimensions linking the categories or as a where categories are represented by their indicators. In this last form of representation, which we will use here, the data are composed of a sample where with
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
