Co-Clustering - Gérard Govaert - E-Book

Co-Clustering E-Book

Gérard Govaert

0,0
139,99 €

oder
-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Cluster or co-cluster analyses are important tools in a variety of scientific areas. The introduction of this book presents a state of the art of already well-established, as well as more recent methods of co-clustering. The authors mainly deal with the two-mode partitioning under different approaches, but pay particular attention to a probabilistic approach.

Chapter 1 concerns clustering in general and the model-based clustering in particular. The authors briefly review the classical clustering methods and focus on the mixture model. They present and discuss the use of different mixtures adapted to different types of data. The algorithms used are described and related works with different classical methods are presented and commented upon. This chapter is useful in tackling the problem of co-clustering under the mixture approach.
Chapter 2 is devoted to the latent block model proposed in the mixture approach context. The authors discuss this model in detail and present its interest regarding co-clustering. Various algorithms are presented in a general context.
Chapter 3 focuses on binary and categorical data. It presents, in detail, the appropriated latent block mixture models. Variants of these models and algorithms are presented and illustrated using examples.
Chapter 4 focuses on contingency data. Mutual information, phi-squared and model-based co-clustering are studied. Models, algorithms and connections among different approaches are described and illustrated.
Chapter 5 presents the case of continuous data. In the same way, the different approaches used in the previous chapters are extended to this situation.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 237

Veröffentlichungsjahr: 2013

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Acknowledgment

Introduction

I.1. Types and representation of data

I.2. Simultaneous analysis

I.3. Notation

I.4. Different approaches

I.5. Model-based co-clustering

I.6. Outline

Chapter 1: Cluster Analysis

1.1. Introduction

1.2. Miscellaneous clustering methods

1.3. Model-based clustering and the mixture model

1.4. EM algorithm

1.5. Clustering and the mixture model

1.6. Gaussian mixture model

1.7. Binary data

1.8. Categorical variables

1.9. Contingency tables

1.10. Implementation

1.11. Conclusion

Chapter 2: Model-Based Co-Clustering

2.1. Metric approach

2.2. Probabilistic models

2.3. Latent block model

2.4. Maximum likelihood estimation and algorithms

2.5. Bayesian approach

2.6. Conclusion and miscellaneous developments

Chapter 3: Co-Clustering of Binary and Categorical Data

3.1. Example and notation

3.2. Metric approach

3.3. Bernoulli latent block model and algorithms

3.4. Parsimonious Bernoulli LBMs

3.5. Categorical data

3.6. Bayesian inference

3.7. Model selection

3.8. Illustrative experiments

3.9. Conclusion

Chapter 4: Co-Clustering of Contingency Tables

4.1. Measures of association

4.2. Contingency table associated with a coupleof partitions

4.3. Co-clustering of contingency table

4.4. Model-based co-clustering

4.5. Comparison of all algorithms

4.6. Conclusion

Chapter 5: Co-Clustering of Continuous Data

5.1. Metric approach

5.2. Gaussian latent block model

5.3. Illustrative example

5.4. Gaussian block mixture model

5.5. Numerical experiments

5.6. Conclusion

Bibliography

Index

First published 2014 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.

Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address:

ISTE Ltd27-37 St George’s Road London SW19 4EU UK

www.iste.co.uk

John Wiley & Sons, Inc.111 River Street Hoboken, NJ 07030USA

www.wiley.com

© ISTE Ltd 2014

The rights of Gérard Govaert and Mohamed Nadif to be identified as the author of this work have been asserted by them in accordance with the Copyright, Designs and Patents Act 1988.

Library of Congress Control Number: 2013950131

British Library Cataloguing-in-Publication Data A CIP record for this book is available from the British Library ISBN: 978-1-84821-473-6

Acknowledgment

This research was supported by the CLasSel ANR project ANR-08-EMER-002.

Introduction

I.1. Types and representation of data

The type of a variable is determined by the set of possible values that the variable can take. In the following, we briefly review each type.

I.1.1. Binary data

Figure I.1.Example of binary data

Binary data have been treated in clustering with a large number of distances, most of which are defined using the values n11, n10, n01 and n00 of the table crossing the two variables. For example, the distances between two binary vectors i and i’ measured using “Jaccarďs index” and the “agreement coefficient” can be written, respectively

I.1.2. Categorical data

Categorical variables, sometimes known as qualitative variables or factors, are a generalization of binary data to situations where there are more than two possible values. Here, each variable may take an arbitrary finite set of values, usually referred to as categories, modalities or levels. Like binary data, categorical data may be represented in different ways: as a table of individuals–variables of dimension (n,d), as a frequency vector for the different possible states, as a contingency table with d dimensions linking the categories or as a where categories are represented by their indicators. In this last form of representation, which we will use here, the data are composed of a sample where with

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!