Latent Variable Models and Factor Analysis - David J. Bartholomew - E-Book

Latent Variable Models and Factor Analysis E-Book

David J. Bartholomew

0,0
70,99 €

oder
-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Latent Variable Models and Factor Analysis provides a comprehensive and unified approach to factor analysis and latent variable modeling from a statistical perspective. This book presents a general framework to enable the derivation of the commonly used models, along with updated numerical examples. Nature and interpretation of a latent variable is also introduced along with related techniques for investigating dependency. This book: * Provides a unified approach showing how such apparently diverse methods as Latent Class Analysis and Factor Analysis are actually members of the same family. * Presents new material on ordered manifest variables, MCMC methods, non-linear models as well as a new chapter on related techniques for investigating dependency. * Includes new sections on structural equation models (SEM) and Markov Chain Monte Carlo methods for parameter estimation, along with new illustrative examples. * Looks at recent developments on goodness-of-fit test statistics and on non-linear models and models with mixed latent variables, both categorical and continuous. No prior acquaintance with latent variable modelling is pre-supposed but a broad understanding of statistical theory will make it easier to see the approach in its proper perspective. Applied statisticians, psychometricians, medical statisticians, biostatisticians, economists and social science researchers will benefit from this book.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 469

Veröffentlichungsjahr: 2011

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Contents

Cover

WILEY SERIES IN PROBABILITY AND STATISTICS

Title Page

Copyright

Preface

Acknowledgements

1: Basic ideas and examples

1.1 The statistical problem

1.2 The basic idea

1.3 Two examples

1.4 A broader theoretical view

1.5 Illustration of an alternative approach

1.6 An overview of special cases

1.7 Principal components

1.8 The historical context

1.9 Closely related fields in statistics

2: The general linear latent variable model

2.1 Introduction

2.2 The model

2.3 Some properties of the model

2.4 A special case

2.5 The sufficiency principle

2.6 Principal special cases

2.7 Latent variable models with non-linear terms

2.8 Fitting the models

2.9 Fitting by maximum likelihood

2.10 Fitting by Bayesian methods

2.11 Rotation

2.12 Interpretation

2.13 Sampling error of parameter estimates

2.14 The prior distribution

2.15 Posterior analysis

2.16 A further note on the prior

2.17 Psychometric inference

3: The normal linear factor model

3.1 The model

3.2 Some distributional properties

3.3 Constraints on the model

3.4 Maximum likelihood estimation

3.5 Maximum likelihood estimation by the E-M algorithm

3.6 Sampling variation of estimators

3.7 Goodness of fit and choice of q

3.8 Fitting without normality assumptions: least squares methods

3.9 Other methods of fitting

3.10 Approximate methods for estimating Ψ

3.11 Goodness of fit and choice of q for least squares methods

3.12 Further estimation issues

3.13 Rotation and related matters

3.14 Posterior analysis: the normal case

3.15 Posterior analysis: least squares

3.16 Posterior analysis: a reliability approach

3.17 Examples

4: Binary data: latent trait models

4.1 Preliminaries

4.2 The logit/normal model

4.3 The probit/normal model

4.4 The equivalence of the response function and underlying variable approaches

4.5 Fitting the logit/normal model: the E-M algorithm

4.6 Sampling properties of the maximum likelihood estimators

4.7 Approximate maximum likelihood estimators

4.8 Generalised least squares methods

4.9 Goodness of fit

4.10 Posterior analysis

4.11 Fitting the logit/normal and probit/normal models: Markov chain Monte Carlo

4.12 Divergence of the estimation algorithm

4.13 Examples

5: Polytomous data: latent trait models

5.1 Introduction

5.2 A response function model based on the sufficiency principle

5.3 Parameter interpretation

5.4 Rotation

5.5 Maximum likelihood estimation of the polytomous logit model

5.6 An approximation to the likelihood

5.7 Binary data as a special case

5.8 Ordering of categories

5.9 An alternative underlying variable model

5.10 Posterior analysis

5.11 Further observations

5.12 Examples of the analysis of polytomous data using the logit model

6: Latent class models

6.1 Introduction

6.2 The latent class model with binary manifest variables

6.3 The latent class model for binary data as a latent trait model

6.4 K latent classes within the GLLVM

6.5 Maximum likelihood estimation

6.6 Standard errors

6.7 Posterior analysis of the latent class model with binary manifest variables

6.8 Goodness of fit

6.9 Examples for binary data

6.10 Latent class models with unordered polytomous manifest variables

6.11 Latent class models with ordered polytomous manifest variables

6.12 Maximum likelihood estimation

6.13 Examples for unordered polytomous data

6.14 Identifiability

6.15 Starting values

6.16 Latent class models with metrical manifest variables

6.17 Models with ordered latent classes

6.18 Hybrid models

7: Models and methods for manifest variables of mixed type

7.1 Introduction

7.2 Principal results

7.3 Other members of the exponential family

7.4 Maximum likelihood estimation

7.5 Sampling properties and goodness of fit

7.6 Mixed latent class models

7.7 Posterior analysis

7.8 Examples

7.9 Ordered categorical variables and other generalisations

8: Relationships between latent variables

8.1 Scope

8.2 Correlated latent variables

8.3 Procrustes methods

8.4 Sources of prior knowledge

8.5 Linear structural relations models

8.6 The LISREL model

8.7 Adequacy of a structural equation model

8.8 Structural relationships in a general setting

8.9 Generalisations of the LISREL model

8.10 Examples of models which are indistinguishable

8.11 Implications for analysis

9: Related techniques for investigating dependency

9.1 Introduction

9.2 Principal components analysis

9.3 An alternative to the normal factor model

9.4 Replacing latent variables by linear functions of the manifest variables

9.5 Estimation of correlations and regressions between latent variables

9.6 Q-Methodology

9.7 Concluding reflections of the role of latent variables in statistical modelling

Software appendix

References

Author Index

Subject Index

WILEY SERIES IN PROBABILITY AND STATISTICS

Established by WALTER A. SHEWHART and SAMUEL S. WILKS

Editors

David J. Balding, Noel A.C. Cressie, Garrett M. Fitzmaurice, Harvey Goldstein, Geert Molenberghs, David W. Scott, Adrian F.M. Smith, Ruey S. Tsay, Sanford Weisberg

Editors Emeriti

Vic Barnett, Ralph A. Bradley, J. Stuart Hunter, J.B. Kadane, David G. Kendall, Jozef L. Teugels

A complete list of the titles in this series can be found on http://www.wiley.com/WileyCDA/Section/id-300611.html

This edition first published 2011

© 2011 John Wiley & Sons, Ltd

Registered office

John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom

For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com.

The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.

Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought.

Library of Congress Cataloging-in-Publication Data

Bartholomew, David J. Latent variable models and factor analysis : a unified approach. – 3rd ed. / David Bartholomew, Martin Knott, Irini Moustaki. p. cm. Includes bibliographical references and index. ISBN 978-0-470-97192-5 (cloth) 1. Latent variables. 2. Latent structure analysis. 3. Factor analysis.I. Knott, M. (Martin) II. Moustaki, Irini.III. Title. QA278.6.B37 2011 519.5′35–dc22 2011007711

A catalogue record for this book is available from the British Library.

Print ISBN: 978-0-470-97192-5 ePDF ISBN: 978-1-119-97059-0 oBook ISBN: 978-1-119-97058-3 ePub ISBN: 978-1-119-97370-6 Mobi ISBN: 978-1-119-97371-3

Preface

It is more than 20 years since the first edition of this book appeared in 1987, and its subject, like statistics as a whole, has changed radically in that period. By far the greatest impact has been made by advances in computing. In 1987 adequate implementation of most latent variable methods, even the well-established factor analysis, was guided more by computational feasibility than by theoretical optimality. What was true of factor analysis was even more true of the assortment of other latent variable techniques, which were then seen as unconnected and very specific to different applications. The development of new models was seriously inhibited by the insuperable computational problems which they would have posed. This new edition aims to take full account of these changes.

The Griffin series of monographs, then edited by Alan Stuart, was designed to consolidate the literature of promising new developments into short books. Knowing that one of us (DJB) was attempting to develop and unify latent variable modelling from a statistical point of view, he proposed what appeared in 1987 as Volume 40 in the Griffin series. Ten years later the series had been absorbed into the Kendall Library of Statistics monographs designed to complement the evergreen volumes of Kendall and Stuart's Advanced Theory of Statistics. Latent Variable Models and Factor Analysis took its place as Volume 7 in that series in 1999. This second edition was somewhat different in character from its predecessor, and a second author (MK) brought his particular expertise into the project. After a further decade that book was in urgent need of revision, and this could only be done adequately by recruiting a third author (IM) who is actively involved at the frontiers of contemporary research. Throughout its long history the principal aim has remained unchanged and it is worth quoting at some length from the Preface of the second edition:

the prime object of the book remains the same – that is, to provide a unified and coherent treatment of the field from a statistical perspective. This is achieved by setting up a sufficiently general framework to enable us to derive the commonly used models, and many more as special cases. The starting point is that all variables, manifest and latent, continuous or categorical, are treated as random variables. The subsequent analysis is then done wholly within the realm of the probability calculus and the theory of statistical inference.

The subtitle, added in this edition, merely serves to emphasise, rather than modify its original purpose.

Chapter 1 covers the same ground as before, but the order of the material has been changed. The aim of the revision is to provide a more natural progression of ideas from the most elementary to the more advanced.

Chapters 2 and 3, as before, are the heart of the book. Chapter 2 provides an overall treatment of the basic model together with an account of general questions of inference relating to it. It introduces what we call the general linear latent variable model (GLLVM) from which almost all of the models considered later in the book are derived as special cases. An important new feature is an introductory account of Markov chain Monte Carlo (MCMC) methods for parameter estimation. These are a good example of the computer-intensive methods which the growth in the power of computers has made possible. In principle, these methods are now capable of handling any of the models in this book and a general introduction is given in this chapter, leaving more detailed treatment until later.

In Chapter 3 the general model is specialised to the normal linear factor model. This includes traditional factor analysis, which is probably the most thoroughly studied and widely applied latent variable model. Little directly relevant research has appeared since the second edition, but our treatment has been revised and this chapter will serve as a source for the basic theory, much of which is now embodied in computer software.

Latent trait models are widely used, especially in educational testing, but they have a far wider field of application, as the examples in Chapter 4 show. The chapter begins with two versions of the model and then discusses the statistical methods available for their implementation. Although the traditional estimation methods, based on likelihood, are efficient and are present in the standard software, we have also taken the opportunity to demonstrate the MCMC method in some detail in a situation where it can easily be compared with established methods. There is no intention here to suggest that its use is limited to such relatively simple examples. On the contrary, this example is designed to illustrate the potential of the MCMC method in a broader context.

Chapters 5 and 7 extend the ideas into newer areas, particularly where ordered categorical variables are involved. A number of the models appeared for the first time in earlier editions. This work has been consolidated here and, now that computing is no longer a barrier, they should find wider application. Latent class models are often seen as among the simpler latent variable models, and in the first edition they appeared much earlier in the book. Here they appear in Chapter 6 where it can be seen more easily, perhaps, how they fit in to the broader scheme.

Chapter 8, on relationships between latent variables, has been supplemented by an account of methods of estimation and goodness-of-fit in the LISREL model, but otherwise is unchanged, apart from the transfer to Chapter 9 of some material noted below.

Chapter 9 is entirely new except for the inclusion of a little material from the old Chapter 8 which now fits more naturally in its new setting. It draws attention to a number of methods, especially principal components analysis, which serve much the same purpose as latent variable models but come from a different statistical tradition.

The examples are an important part of the text. They are intended not only to illustrate the mechanics of putting the theory into practice but they also bring to light many subtleties which are not immediately evident from the formal derivations. This is especially important in latent variable modelling where questions of interpretation need to be explored in numerical terms for their full implications to be appreciated. Many of the original examples have been retained because, although the data on which they are based are now necessarily older, it is the point that the examples make which is important. Where we felt that these could not be bettered, they have been retained. But, in some cases, we have replaced original examples and added new ones where we felt that an improvement could be made. However, all the original examples have been recalculated using the newer software described in the Appendix.

There was a website linked to the second edition which has been discontinued. There are two reasons for this. First, we have provided an appendix to this book which gives details of the more comprehensive software that is currently available: the new appendix has removed the need for the individual programs provided on the original website. Secondly, it is now much easier to find numerical examples on which the methods can be tried out. One convenient source is in Bartholomew et al. (2008) and its associated website, where there are extensive data sets and some of the methods are described in a form more suitable for users.

Acknowledgements

Alan Stuart died in 1998, but his encouragement and support in getting the first edition off the ground, when latent variable models were often viewed by statisticians with suspicion, if not hostility, still leave the statistical community in his debt.

Much of the earlier editions remains, as does our debt to those who contributed to them: Lilian de Menezes, Panagiota Tzamourani, Stephen Wood, Teresa Albanese and Brian Shea, all once at the London School of Economics. Fiona Steele read a draft of the new Chapter 9 and her comments have materially helped the exposition.

The anonymous advice garnered by our publisher, John Wiley, for this edition was invaluable both in encouraging us to proceed and in defining the changes and additions we have made.

We extensively used the IRTPRO software for producing output for the factor analysis model for categorical variables. The authors of the software, Li Cai, Stephen du Toit and David Thissen, have kindly provided us with a free version of the software, and Li Cai in particular helped us resolve any software-related questions. We would also like to thank Jay Magidson and Jeroen Vermunt for their help with Latent Gold and Albert Maydeu-Olivares for sharing with us the UK data on Eysenck's Personality Questionnaire–Revised.

The material relating to Sir Godfrey Thomson's work in Chapter 9 was covered in much greater detail in a research project at the University of Edinburgh in which one of us (DJB) was a principal investigator. References to relevant publications arising from the project are included here. This project was financed as part of research supported by the Economic and Social Research Council, grant no. RES-000-23-1246.

David J. Bartholomew Martin Knott Irini Moustaki London School of Economics and Political Science January 2011

1

Basic ideas and examples

1.1 The statistical problem

Latent variable models provide an important tool for the analysis of multivariate data. They offer a conceptual framework within which many disparate methods can be unified and a base from which new methods can be developed. A statistical model specifies the joint distribution of a set of random variables and it becomes a latent variable model when some of these variables – the latent variables – are unobservable. In a formal sense, therefore, there is nothing special about a latent variable model. The usual apparatus of model-based inference applies, in principle, to all models regardless of their type. The interesting questions concern why latent variables should be introduced into a model in the first place and how their presence contributes to scientific investigation.

One reason, common to many techniques of multivariate analysis, is to reduce dimensionality. If, in some sense, the information contained in the interrelationships of many variables can be conveyed, to a good approximation, in a much smaller set, our ability to ‘see’ the structure in the data will be much improved. This is the idea which lies behind much of factor analysis and the newer applications of linear structural models. Large-scale statistical enquiries, such as social surveys, generate much more information than can be easily absorbed without drastic summarisation. For example, the questionnaire used in a sample survey may have 50 or 100 questions and replies may be received from 1000 respondents. Elementary statistical methods help to summarise the data by looking at the frequency distributions of responses to individual questions or pairs of questions and by providing summary measures such as percentages and correlation coefficients. However, with so many variables it may still be difficult to see any pattern in their interrelationships. The fact that our ability to visualise relationships is limited to two or three dimensions places us under strong pressure to reduce the dimensionality of the data in a manner which preserves as much of the structure as possible. The reasonableness of such a course is often evident from the fact that many questions overlap in the sense that they seem to be getting at the same thing. For example, one’s views about the desirability of private health care and of tax levels for high earners might both be regarded as a reflection of a basic political position. Indeed, many enquiries are designed to probe such basic attitudes from a variety of angles. The question is then one of how to condense the many variables with which we start into a much smaller number of indices with as little loss of information as possible. Latent variable models provide one way of doing this.

A second reason is that latent quantities figure prominently in many fields to which statistical methods are applied. This is especially true of the social sciences. A cursory inspection of the literature of social research or of public discussion in newspapers or on television will show that much of it centres on entities which are handled as if they were measurable quantities but for which no measuring instrument exists. Business confidence, for example, is spoken of as though it were a real variable, changes in which affect share prices or the value of the currency. Yet business confidence is an ill-defined concept which may be regarded as a convenient shorthand for a whole complex of beliefs and attitudes. The same is true of quality of life, conservatism, and general intelligence. It is virtually impossible to theorise about social phenomena without invoking such hypothetical variables. If such reasoning is to be expressed in the language of mathematics and thus made rigorous, some way must be found of representing such ‘quantities’ by numbers. The statistician’s problem is to establish a theoretical framework within which this can be done. In practice one chooses a variety of indicators which can be measured, such as answers to a set of yes/no questions, and then attempts to extract what is common to them.

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!