101,99 €
An easily accessible introduction to log-linear modeling fornon-statisticians Highlighting advances that have lent to the topic's distinct,coherent methodology over the past decade, Log-Linear Modeling:Concepts, Interpretation, and Application provides anessential, introductory treatment of the subject, featuring manynew and advanced log-linear methods, models, and applications. The book begins with basic coverage of categorical data, andgoes on to describe the basics of hierarchical log-linear models aswell as decomposing effects in cross-classifications andgoodness-of-fit tests. Additional topics include: * The generalized linear model (GLM) along with popular methodsof coding such as effect coding and dummy coding * Parameter interpretation and how to ensure that the parametersreflect the hypotheses being studied * Symmetry, rater agreement, homogeneity of association, logisticregression, and reduced designs models Throughout the book, real-world data illustrate the applicationof models and understanding of the related results. In addition,each chapter utilizes R, SYSTAT®, and §¤EM software,providing readers with an understanding of these programs in thecontext of hierarchical log-linear modeling. Log-Linear Modeling is an excellent book for courses oncategorical data analysis at the upper-undergraduate and graduatelevels. It also serves as an excellent reference for appliedresearchers in virtually any area of study, from medicine andstatistics to the social sciences, who analyze empirical data intheir everyday work.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 623
Veröffentlichungsjahr: 2014
CONTENTS
PREFACE
ACKNOWLEDGMENTS
CHAPTER 1: BASICS OF HIERARCHICAL LOG-LINEAR MODELS
1.1 SCALING: WHICH VARIABLES ARE CONSIDERED CATEGORICAL?
1.2 CROSSING TWO OR MORE VARIABLES
1.3 GOODMAN’S THREE ELEMENTARY VIEWS OF LOG-LINEAR MODELING
1.4 ASSUMPTIONS MADE FOR LOG-LINEAR MODELING
CHAPTER 2: EFFECTS IN A TABLE
2.1 THE NULL MODEL
2.2 THE ROW EFFECTS-ONLY MODEL
2.3 THE COLUMN EFFECTS-ONLY MODEL
2.4 THE ROW- AND COLUMN-EFFECTS MODEL
2.5 LOG-LINEAR MODELS
CHAPTER 3: GOODNESS-OF-FIT
3.1 GOODNESS-OF-FIT I: OVERALL FIT STATISTICS
3.2 GOODNESS-OF-FIT II: R2 EQUIVALENTS AND INFORMATION CRITERIA
3.3 GOODNESS-OF-FIT III: NULL HYPOTHESES CONCERNING PARAMETERS
3.4 GOODNESS-OF-FIT IV: RESIDUAL ANALYSIS
3.5 THE RELATIONSHIP BETWEEN PEARSON’S X2 AND LOG-LINEAR MODELING
CHAPTER 4: HIERARCHICAL LOG-LINEAR MODELS AND ODDS RATIO ANALYSIS
4.1 THE HIERARCHY OF LOG-LINEAR MODELS
4.2 COMPARING HIERARCHICALLY RELATED MODELS
4.3 ODDS RATIOS AND LOG-LINEAR MODELS
4.4 ODDS RATIOS IN TABLES LARGER THAN 2 × 2
4.5 TESTING NULL HYPOTHESES IN ODDS-RATIO ANALYSIS
4.6 CHARACTERISTICS OF THE ODDS RATIO
4.7 APPLICATION OF THE ODDS RATIO
4.8 THE FOUR STEPS TO TAKE WHEN LOG-LINEAR MODELING
4.9 COLLAPSIBILITY
CHAPTER 5: COMPUTATIONS I: BASIC LOG-LINEAR MODELING
5.1 LOG-LINEAR MODELING IN R
5.2 LOG-LINEAR MODELING IN SYSTAT
5.3 LOG-LINEAR MODELING IN ℓEM
CHAPTER 6: THE DESIGN MATRIX APPROACH
6.1 THE GENERALIZED LINEAR MODEL (GLM)
CHAPTER 7: PARAMETER INTERPRETATION AND SIGNIFICANCE TESTS
7.1 PARAMETER INTERPRETATION BASED ON DESIGN MATRICES
7.2 THE TWO SOURCES OF PARAMETER CORRELATION: DEPENDENCY OF VECTORS AND DATA CHARACTERISTICS
7.3 CAN MAIN EFFECTS BE INTERPRETED?
7.4 INTERPRETATION OF HIGHER ORDER INTERACTIONS
CHAPTER 8: COMPUTATIONS II: DESIGN MATRICES AND POISSON GLM
8.1 GLM-BASED LOG-LINEAR MODELING IN R
8.2 DESIGN MATRICES IN SYSTAT
8.3 LOG-LINEAR MODELING WITH DESIGN MATRICES IN ℓEM
CHAPTER 9: NONHIERARCHICAL AND NONSTANDARD LOG-LINEAR MODELS
9.1 DEFINING NONHIERARCHICAL AND NONSTANDARD LOG-LINEAR MODELS
9.2 THE VIRTUES OF NONHIERARCHICAL AND NONSTANDARD LOG-LINEAR MODELS
9.3 SCENARIOS FOR NONSTANDARD LOG-LINEAR MODELS
9.4 NONSTANDARD SCENARIOS: SUMMARY AND DISCUSSION
9.5 SCHUSTER’S APPROACH TO PARAMETER INTERPRETATION
CHAPTER 10: COMPUTATIONS III: NONSTANDARD MODELS
10.1 NONHIERARCHICAL AND NONSTANDARD MODELS IN R
10.2 ESTIMATING NONHIERARCHICAL AND NONSTANDARD MODELS WITH SYSTAT
10.3 ESTIMATING NONHIERARCHICAL AND NONSTANDARD MODELS WITH ℓEM
CHAPTER 11: SAMPLING SCHEMES AND CHI-SQUARE DECOMPOSITION
11.1 SAMPLING SCHEMES
11.2 CHI-SQUARE DECOMPOSITION
CHAPTER 12: SYMMETRY MODELS
12.1 AXIAL SYMMETRY
12.2 POINT SYMMETRY
12.3 POINT AXIAL SYMMETRY
12.4 SYMMETRY IN HIGHER DIMENSIONAL CROSS-CLASSIFICATIONS
12.5 QUASI-SYMMETRY
12.6 EXTENSIONS AND OTHER SYMMETRY MODELS
12.7 MARGINAL HOMOGENEITY: SYMMETRY IN THE MARGINALS
CHAPTER 13: LOG-LINEAR MODELS OF RATER AGREEMENT
13.1 MEASURES OF RATER AGREEMENT IN CONTINGENCY TABLES
13.2 THE EQUAL WEIGHT AGREEMENT MODEL
13.3 THE DIFFERENTIAL WEIGHT AGREEMENT MODEL
13.4 AGREEMENT IN ORDINAL VARIABLES
13.5 EXTENSIONS OF RATER AGREEMENT MODELS
CHAPTER 14: COMPARING ASSOCIATIONS IN SUBTABLES: HOMOGENEITY OF ASSOCIATIONS
14.1 THE MANTEL–HAENSZEL AND BRESLOW–DAY TESTS
14.2 LOG-LINEAR MODELS TO TEST HOMOGENEITY OF ASSOCIATIONS
14.3 EXTENSIONS AND GENERALIZATIONS
CHAPTER 15: LOGISTIC REGRESSION AND OTHER LOGIT MODELS
15.1 LOGISTIC REGRESSION
15.2 LOG-LINEAR REPRESENTATION OF LOGISTIC REGRESSION MODELS
15.3 OVERDISPERSION IN LOGISTIC REGRESSION
15.4 LOGISTIC REGRESSION VERSUS LOG-LINEAR MODELING
15.5 LOGIT MODELS AND DISCRIMINANT ANALYSIS
15.6 PATH MODELS
CHAPTER 16: REDUCED DESIGNS
16.1 FUNDAMENTAL PRINCIPLES FOR FACTORIAL DESIGN
16.2 THE RESOLUTION LEVEL OF A DESIGN
16.3 SAMPLE FRACTIONAL FACTORIAL DESIGNS
CHAPTER 17: COMPUTATIONS IV: ADDITIONAL MODELS
17.1 ADDITIONAL LOG-LINEAR MODELS IN R
17.2 ADDITIONAL LOG-LINEAR MODELS IN SYSTAT
17.3 ADDITIONAL LOG-LINEAR MODELS IN ℓEM
REFERENCES
TOPIC INDEX
AUTHOR INDEX
Cover illustration: @Anto Titus/iStock photo
Copyright ©2013 by John Wiley & Sons, Inc. All rights reserved.
Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created ore extended by sales representatives or written sales materials. The advice and strategies contained herin may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services please contact our Customer Care Department with the U.S. at 877-762-2974, outside the U.S. at 317-572-3993 or fax 317-572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print, however, may not be available in electronic format.
Library of Congress Cataloging-in-Publication Data:
von Eye, Alexander.
Log-linear modeling: concepts, interpretation, and application/Alexander von Eye, Michigan State University, Department of Psychology, East Lansing, MI, Eun-Young Mun, Rutgers, The State University of New Jersey, Center of Alcohol Studies, Piscataway, NJ.
pages cm
Includes bibliographical references and index.
ISBN 978-1-118-14640-8 (hardback)
1. Log-linear models. I. Mun, Eun Young II. Title. III. Title: Log linear modeling.
QA278.E95 2012519.5′36–dc 2012009791
PREFACE
The term “log-linear modeling” appeared, for the first time in 1969, in Bishop and Fienberg[14] (p. 119; see also David[44]). Since its first appearance, the associated methods have experienced very impressive development and are now among the most popular methods used for the analysis of frequency data. Specifically, log-linear modeling is widely used in analyzing multivariate frequency tables, that is, multivariate cross-classifications. Log-linear modeling is used to identify the main effects or interactions that are needed to describe the joint distribution in the cross-classification. Whenever interactions are significant, however, main effects become less interesting. In addition to main effects and interaction effects, covariates can be taken into account, and special contrasts can be specified.
Fitting log-linear models involves decisions concerning the parameters that are significantly different than zero. Significant parameters are prime candidates to be included in a model. Nonsignificant terms (parameters) cost degrees of freedom, but this investment fails to make a contribution to the explanation of the data.
This book provides an introduction to log-linear modeling. In the early chapters, it discusses categorical data and presents Goodman’s view of the goals that researchers pursue when analyzing cross-classifications. The notation of effects in cross-classifications is introduced, followed by methods for the analysis of goodness-of-fit. In Chapter 4, the classical hierarchical log-linear models are introduced.
Just as most other multivariate methods of analysis, log-linear models are rarely estimated without the use of a computer. Therefore, this book contains three chapters in which it is shown how log-linear models can be estimated. In each of these chapters, three software packages are exemplified and discussed: R, SYSTAT, and ℓEM. Chapter 5 introduces these packages in the context of hierarchical log-linear modeling.
Log-linear models can be considered from a number of perspectives. Some authors have taken the perspectives of odds-ratio analysis, the Generalized Linear Model, or Analysis of Variance (ANOVA). In this text, we take the most general approach to log-linear modeling. In the design matrix approach, a model is defined by the vectors in the design matrix. The vectors represent the hypothesized effects.
The design matrix approach comes with a number of advantages. Many readers will know this approach from the General Linear Model. This approach also makes it easy to introduce the two most popular methods of coding the effects of interest: dummy and effect coding (Chapter 6). Likewise, by this approach, it is easy to introduce special models, for example, symmetry models (Chapters 9 and 12), models of logistic regression (Chapter 15), models of rater agreement (Chapter 13), models of homogeneity of associations (Chapter 14), and reduced designs, that is, designs that include only a fraction of the cells of a completely crossed table (Chapter 16). Most important, the design matrix approach provides a general and easy-to-use tool for the introduction of nonhierarchical and nonstandard models (Chapter 9).
Nonstandard models can be extremely useful. They contain terms that cannot be recast in terms of hierarchical log-linear models. However, parameter interpretation may pose problems. Here, the design matrix approach presents its biggest virtue. By this approach, the interpretation of parameters that are estimated for nonstandard models (as well as for hierarchical and nonhierarchical models) can be made explicit. In many instances, those parameters fail to reflect the hypotheses the researchers wish to test. Schuster’s approach to parameter interpretation is presented, which helps researchers to exactly test the hypotheses they are after (Section 9.5).
Parameter interpretation is given broad coverage in this book. Log-linear parameters can be interpreted only if the model fits. Next, it is important to make sure that the parameters reflect the hypotheses under study. The design matrix approach and Schuster’s methods help researchers make certain that this is the case. When parameters are significant and can be interpreted as intended, they can be interpreted as qualifying lower order terms. This is quantified in Elliott’s approach to parameter interpretation in hierarchical models. This approach is presented in Section 7.4.
Chapters 8 and 17 illustrate how nonhierarchical and nonstandard models can be estimated with R, SYSTAT, and and ℓEM. We give real data examples to illustrate application of models and interpretation of results.
Readers who need broad coverage and examples of log-linear modeling will benefit most from this text. The targeted readers are applied researchers, students, and instructors, as well as those who wish to learn more about log-linear modeling. The book is written at a level that should pose no major problems to students after the introductory statistics courses. There are no disciplinary boundaries. Readers can have a background in psychology, sociology, education, anthropology, epidemiology, nursing, medicine, statistics, criminal justice, pharmacology, biometry, or, in brief, any discipline that considers analyzing empirical data that can be categorical in nature. This book can be used in courses on categorical data analysis, log-linear modeling, and it can be used in tandem with texts on Configural Frequency Analysis (von Eye, Mair, & Mun, 2010), a method that employs the tools of log-linear modeling to inspect individual cells of multivariate cross-classifications.
ALEXANDER VON EYE AND EUN-YOUNG MUNEast Lansing, MI and Piscataway, NJFebruary 2012
ACKNOWLEDGMENTS
The authors are indebted to a good number of people for their help and support. First of all and most of all, there are Donata and Feng. Neither of you may be a log-linear model, but you are models of spousal love and support that fit perfectly. Without you, there would be no modeling.
Second, we thank the many students who have taken the authors’ courses on categorical data analysis at the Pennsylvania State University, Michigan State University, Rutgers University, the University of Trier (Germany), and the University of Vienna (Austria), in departments of Psychology, Economy, and Statistics. Your enthusiasm and smart questions convinced us of the need for a book like this. Here it is. We hope you like it.
Third, we also thank Amy Hendrickson who provided helpful suggestions for formatting tables with LATEX and Caressa Slocum who helped proofread this book.
Finally, we thank Jacqueline Palmieri, Stephen Quigley, and Rosalyn Farkas of John Wiley & Sons. From the first contact on, they were very supportive, enthusiastic, and helpful. Jackie, Steve, and Rosalyn made sure this book is in the best hands at Wiley. We appreciate this!
AvE & EYM
In this chapter, we pursue four goals. First, we introduce basic ideas and issues concerning hierarchical log-linear modeling. This introduction leads to an understanding of the situation a researcher faces when dealing with categorical variables, and to an appreciation of the data material to be processed. Second, the basic assumptions that need to be made for proper log-linear modeling are discussed. Third, we talk about effects in a table, various approaches to analyzing these effects, odds ratios, and first elements of log-linear modeling. Fourth, we end the chapter with a discussion of hierarchical log-linear models using the opensource environment R [183], SYSTAT [203], a standard, general purpose software package, ℓEM [218], a specialized program for categorical data analysis.
Before deciding which variables can be considered categorical, we briefly review parts of the discussion on scale levels. This discussion has been going on since the mid-twentieth century, and there is still no conclusion. Log-linear models are typically estimated for categorical data. Therefore, a review of this discussion is of importance.
A scale results from measurement, that is, from assigning numbers to objects. Among the best known results of statistics are the scale levels that Stevens [201] proposed in 1946. Stevens proposed a hierarchy of four scale levels. To introduce this hierarchy, McCall [156] discussed three properties of scales. The hierarchy results from combining these properties.
The first property is that of magnitude. Scales that possess magnitude allow one to judge whether one object is greater than, less than, or equal to another object. For example, if a scale of schizophrenia assigns a score of 7 to the first patient and a score of 11 to the second, one can conclude that the second patient is a more severe case than the first only if this scale possesses magnitude.
The second property is that of equal intervals. Scales of equal intervals allow one to interpret the size of differences between scores. For example, if Person A has an IQ score of 120 and Person B has an IQ score of 100, there is a difference of 20 IQ points. Now, if Person C has an IQ score of 80, than the difference between A and B is the same as the difference between B and C.
The third property is that of the absolute zero point. Scales with absolute zero points allow one to indicate that nothing of a particular characteristic is observed. For example, if a car is measured to move with a speed of zero, the car is standing still.
Combining these three properties yields Stevens’s four scale levels. We present this hierarchy beginning at the bottom, where the least complex mathematical operations are possible. With each higher level, new operations become possible.
The bottom feeder of this hierarchy is the nominal scale level. This scale (some authors state that the nominal level does not even qualify as a scale) possesses none of the three properties discussed above. This scale is used to label and distinguish objects. For example, humans are labeled as females and males. In data analysis, every female is assigned the same number, and every male is assigned the same number. In more elaborate systems, species are labeled and differentiated, and so are diseases, theories, religions, types of chocolate, and political belief patterns.
At the nominal level, every individual with a particular label is considered identical to every other individual with the same label. All researchers can do at this level is ask whether two objects share the same label or characteristic. One can thus determine whether two objects are the same (=) or different (≠). Any other operation would require different scale properties.
The second level in the hierarchy is that of an ordinal scale. At this level, scales possess magnitude. However, they do not possess equal intervals or absolute zero points. For example, when the doctor asks the patient how she is feeling and she responds “better than at my last visit,” her response is given at the ordinal level. Another example involves rank-ordering faculty based on their contributions to the department. The operations possible at the ordinal scale level lead to statements as to whether one comparison object is greater than another (>), less than another (<), or equal to another (=). Clearly, the > and < are more differentiated ways of saying ≠. At the ordinal scale level, correlations are possible; averages, however, and variances cannot be calculated. Examples of ordinal scales include questionnaire scales that ask respondents to indicate the degree of their agreement on a scale from 1 to 7, grades in school, and the Olympic medal ranking in figure skating. In none of these cases, is the distance between scale points defined and interpretable.
Clearly, at the ordinal level, the doctor does not know how much better the patient feels. At the next higher scale level, the interval level, this increase can be quantified. The interval scale possesses magnitude and equal intervals, but no absolute zero point. As was indicated above, equal interval implies that the distance between two scale points can be determined, and that the distance units are the same over the entire range of admissible scores. At this scale level, means and variances can be calculated, and we can, thus, perform an analysis of variance, a factor analysis, and we can estimate a structural equation model. Sample scales include most psychometric tests, such as intelligence, aggression, and depression.
The top level in the hierarchy of measurement scales is the ratio scale. This scale possesses all three of the properties above: magnitude, equal intervals, and the absolute zero point. At this level, one can, in addition to the operations that are possible for the lower level scales, perform all other arithmetic operations, and one can use all tools of higher mathematics. For example, scores can be added to each other, multiplied by each other, and one can transform scores using logarithmic transformations.
While consistent in itself, Stevens’s classification has met with critical appraisal. In an article by Velleman and Wilkinson [214], we find, among other issues, five points of critique (see also Hand [95]):
Indeed, in parts of the literature, Stevens’s classification is not used any more. An example can be found in Clogg [32]. The author discusses the following possible scales instead (p. 314):
continuous (quantitative)
restricted continuous (due to censoring or truncation)
categorical dichotomous
categorical nominal (multiple categories, but no ordering)
categorical ordinal (with levels as in a Likert scale)
categorical quantitative (levels are spaced with distances between scale points known in principle).
In addition, Clogg discusses mixed scale types, for example, scales that are partially ordered. Examples of such scales include categorical ordinal scales that, for instance, also include a “don’t know” response option.
What can we conclude from this discussion? Most important, categorical variables come in many different forms. One form is constituted by variables with naturally distinct categories. Consider, for example, the (numerical) categories of the variable Car Brand. These categories are distinct in the sense that there is no ordering, and scale values between those that label the categories are not meaningful. It makes no sense to say that a car is of the brand 1.7. Distinct categories can also be found in rank orders, but here, cases can be assigned to averaged ranks (thus violating the tenet that ranks cannot be averaged; see above). Even interval level or ratio scale scores can be categorical. For instance, clinical diagnoses are often based on symptom scores that can be at the interval level, but the classification as “case” versus “not a case” is categorical.
In this book, we consider variables categorical when the number of scale values is small. One virtue of the methods discussed here is that scale characteristics can be taken into account. For example, there are models for scale scores that are ranked (e.g., see Section 9.3.6), and for scale scores that reflect different distances from one another (e.g., see Chapter 13). Thus, researchers are very flexible in the options of taking scale characteristics into account. In the next section, we begin the introduction of technical aspects of the analysis of categorical data. We discuss cross-classifications of two and more variables.
Crossing categorical variables yields cross-classifications, also called contingency tables. These representations allow one to analyze the joint frequency distribution of two or more variables. To introduce cross-classifications, let us use the following notation. Observed cell frequencies are denoted with m, and estimated expected cell frequencies are denoted with . Later, we will use subscripts to indicate the exact location of a frequency in a cross-classification. In this section, we introduce two forms to display cross-classifications, the matrix form and the tabular form.
Table 1.1 Frequency Distribution of the Four-category Variable A
To give an example of a cross-classification of two variables, consider again the above Variable A, and Variable B, which has the two categories b1 and b2. Crossing the four categories of A with the two categories of B yields a 4 × 2 contingency table. This table is given in Table 1.2. As can be seen from the cell entries, we now need two subscripts to indicate the location of a frequency in a table. By convention, the first subscript indicates the row and the second indicates the column in which a frequency is located. For example, the observed frequency m32 is located in the third row and the second column of the table. The last column of the table displays the row marginals (also called sum or row total), that is, the sum of the frequencies in a row. The last row of the table displays the column marginals (also called sum or column total), that is, the sum of the frequencies in a column. The cell in the lower right corner of the table contains the sum of all frequencies, that is, the sample size, N.
Table 1.2 Cross-classification of the Four-category Variable A and the Two-category Variable B
Table 1.3 2 × 2 × 3 Cross-classification of the Three Variables X, Y, and Z
Now, in many cases, researchers are interested in analyzing more than two variables. Cross-classifications of three or more variables can also be presented in matrix form, for example the staggered matrices in Table 1.4. One way of doing this for three variables is to present a two-variable table, that is, a 2D table for each category of the third variable. Accordingly, for four and more variables, 2D arrangements can be created for each combination of the third, fourth, and following variables. These tables can be hard to read. Therefore, many researchers prefer the so-called tabular representation of contingency tables. This representation contains, in its left column, the cell indices. In the next column, it contains the corresponding observed cell frequencies. In possibly following columns, the expected cell frequencies can be given, residuals, and the results of cell-wise evaluations or tests. Table 1.3 presents the tabular form of the 2 × 2 × 3 cross-classification of the three variables X, Y, and Z.
Table 1.4 3 (Concreteness) × 2 (Gender) × 2 (Correctness) × 2 (Example Use) Cross-classification for the 10th Statement
Table 1.4 is arranged in a slightly different way than Table 1.3. It is arranged in panels. It is the goal of this book to introduce readers to methods of analysis of tables of the kind shown here.
The following are sample questions for the variables that span Table 1.4. These questions can be answered using the methods discussed in this book:
There are many more questions that can be asked. This book will present methods of analysis for many of them. In the next section, we discuss these questions from a more general perspective.
Goodman [82] discusses three elementary views of log-linear modeling. In the context of 2D tables, the author states (p. 191) that “log-linear modeling can be used
When more than two variables are studied, these views carry over accordingly. In addition, these three views can be taken, as is customary, when an entire table is analyzed or, as it has recently been discussed (Hand & Vinciotti [96]), local models are considered (see also Havránek & Lienert [99]). Local models include only part of a table, exclude part of a table, or contain parameters that focus on parts of a table only.
When dependency relations are modeled, results are typically expressed in terms of conditional probabilities, odds ratios, or regression parameters from logit models. As is well-known and as is illustrated in this text, log-linear models can be used equivalently. Path models and mediator models can be estimated.
When association patterns are modeled, results are typically expressed in terms of associations or interactions that can involve two or more variables. To analyze associations and interactions, variables do not need to be classified as dependent and independent. All variables can have the same status.
In this text, we discuss methods for each of these three views. Before we delve into technical details, however, we briefly discuss, in the next section, assumptions that need to be made when analyzing cross-classifications (see Wickens [261]).
As will be discussed later (see Section 3.1), X2-tests can be used to appraise the correspondence of model and data. X2-statistics are asymptotic statistics that approximate the χ2 distribution well if certain assumptions can be made. Three of the most important assumptions are that the cases in a table (1) are independent of each other, (2) have similar distributions, and (3) are numerous enough. We now discuss each of these assumptions.
The independence assumption, also called the assumption of probabilistic independence, is most important, and this assumption is made in many other contexts of statistical data analysis. It implies that no case carries more than random information about any other case. This assumption is usually reduced to the requirement that cells must be filled with responses from different cases. However, such a requirement is not always sufficient to guarantee independence. For example, if the results of political elections are predicted, the vote cast by Voter A must not determine the vote cast by Voter B. This, however, is rarely guaranteed. There are entire districts in which voters traditionally vote for a particular party. In these districts, the vote cast by one voter allows one to predict the votes cast by a large number of other voters. Accordingly, family members often vote for the same party, and friends often agree on voting for a particular candidate. This is legal and does not jeopardize the validity of elections or statistical analysis.
Clearly, if the same person goes into a table more than once, this assumption is violated. Therefore, repeated measures analysis is a different beast than the analysis of cross-sectional data. What is the damage that is done when cases fail to be independent? In general, bias will result. It is not always clear which parameter estimate will be biased. However, it is easy to demonstrate that, in certain cases, severe bias can result for both mean and variance. Consider the case where two candidates run for office. Let the true voter distribution, one week before the elections, be such that 50% tend to vote for Candidate A, and the other 50% for Candidate B. Now, a TV station that predicts the outcome of the elections asks 100 voters. For lack of independence, the sample contains 75% supporters of Candidate A.
Table 1.5 Example of Mean and Variance Bias in Polling Example
Table 1.5 shows how the mean and the variance of the true voter distribution and the one published by the TV station compare. It shows that, in this example, the mean is dramatically overestimated, and the variance is dramatically underestimated. In general, any parameter estimate can be affected by bias, and the direction of a bias is not always obvious.
The second assumption to be made when assessing goodness-of-fit using X2-based statistics concerns the distribution of scores. It is the assumption that the data were drawn from a homogeneous population. If this is the case, the parameters are the same for each individual. If, however, data come from mixed distributions, or individuals respond to different effects, the aggregate of the data can be problematic (for discussion and examples, see Loken [143]; Molenaar & Campbell [166]; von Eye & Bergman [228]; Walls & Schafer [259]). First, it is unclear which effects are reflected and which are not reflected in the data. Second, the aggregate of data from multiple populations could have the effect that the X2 calculated from the data approximates the χ2 only poorly.
Therefore, if the provenance of frequencies varies and is known, it needs to be made part of the model. For example, one can add a variable to a model that classifies cases based on the population of origin. Mantel–Haenszel statistics can then be used to compare associations across the populations (see Section 14.1). If, however, the provenance of data is unknown, but researchers suspect that the data may stem from populations that differ in parameters, methods of finite mixture distribution decomposition can be applied to separate the data from different populations (Erdfelder [53]; Everitt & Hand [56]; Leisch [133]).
The third assumption concerns the sample size. To obtain sufficient statistical power, the sample must be assumed to be sufficiently large. Two lines of arguments need to be pursued. One involves performing standard power analysis (Cohen [38]) to determine sample sizes before data are collected, or to determine empirical power when the sample is given. In the other, rules are proposed, for instance, the rules for 2D tables proposed by Wickens [261]; see also the discussion in von Eye [221]:
It goes without saying that other sources propose different rules. For example, sample size requirements based on Cohen’s[38] power calculations can be dramatically different, and much larger samples may be needed than estimated based on Wickens’s [261] rules. In addition, it is unclear how these rules need to be modified for cross-classifications that are spanned by three or more variables. A debate concerns rule number 3. If is clearly less than 1 and mij > 0, the Pearson X2 component for Cell ij will inflate, and a model will be rejected with greater probability. Therefore, it has been proposed that Delta option be invoked in these cases, that is, by adding a small constant, in most cases 0.5, to each cell. Some software packages, notably SPSS, add this constant without even asking the user. In Section 3.1, on goodness-of-fit testing, the effects of small expected cell frequencies on the two most popular goodness-of-fit tests will be illustrated.
An alternative to the Delta option involves performing exact tests. This is an option that is available, in particular, for small tables and simple models. In this text, we focus on the approximate X2 tests because, for large tables and complex models, exact tests are less readily available.
In this and the following sections, we develop a hierarchy of models, beginning with the most parsimonious model. This model proposes that no effect exists whatsoever. It is called the null model. If this proposition holds, deviations from average are no stronger than random. The expected probability for Cell ij is The corresponding expected cell frequencies are
Table 2.1 Cross-classification of Concreteness (C) and Wordiness (W) of the Interpretations of Statement 8; Expected Frequencies Estimated for the Null Model
Table 2.2 Cross-classification of Concreteness (C) and Wordiness (W) of the Interpretations of Statement 8; Expected Frequencies Estimated for the Concreteness Effects-only Model
The row effects-only model proposes that knowledge of the row marginals is sufficient to explain the frequency distribution in a table. This would allow one to estimate the expected cell probabilities based on the marginal probabilities of the rows, , as . Table 2.2 shows the analysis of the statement data in Table 2.1 under the Concreteness effects-only model.
The estimated expected frequencies in Table 2.2 reflect the propositions of the Concreteness effect-only model. This model proposes that the cell frequencies are proportional to the row probabilities, but not to the column probabilities. Specifically, the cell frequencies are estimated to be .
The magnitude of the residuals suggests that this model is much better than the null model. It may not be good enough to explain the data in a satisfactory way, but the discrepancies between the observed and the estimated expected cell frequencies, as well as the standardized residuals, are much smaller than in Table 2.1. In fact, seven of the nine residuals are now smaller than 1.96, thus indicating that, for these seven cells, there is no significant model–data discrepancy. We can conclude that the knowledge that (1) most respondents interpret statements using mostly abstract words and (2) the smallest portion of respondents uses concrete words for interpretation, allows one to make a major contribution to the explanation of the data in Table 2.2. Whether or not this contribution is significant will be discussed later. In the next section, we discuss the Wordiness effects-only model.
In a fashion analogous to the row effects-only model, the column effects-only model proposes that knowledge of the marginal column probabilities helps explain the frequency distribution in a table. Specifically, this model proposes that . Table 2.3 shows the analysis of the statement data in Table 2.1 under the Wordiness effects-only model.
Table 2.3 Cross-classification of Concreteness (C) and Wordiness (W) of the Interpretations of Statement 8; Expected Frequencies Estimated for the Wordiness Effects-only Model
The results in Table 2.3 suggest that Wordiness effects make less of a contribution to the explanation of the data than the Concreteness effects. It shows that the discrepancies between the observed and the estimated expected cell frequencies, expressed in comparable units by the standardized residuals, are larger than in Table 2.2. All of the z-scores in this table are greater than 1.96. Still, they are, on average, smaller than those in Table 2.1, that is, those for the null model. We conclude that Wordiness makes a little contribution. As for the row-effects model, we do not know whether this contribution is significant. All we know, at this point, is that it is greater than nothing.
It is worth noting that, in the hierarchy of models that we are developing, both the row-effects model and the column-effects model are one level above the null model, and operate at the same hierarchy level. This is of importance when models are compared based on their ability to explain data. The model discussed in the next section operates at the next higher level of the model hierarchy.
In the context of contingency table analysis, an event is the observation of a category of a variable. For example, it is considered an event that the interpretation of a statement is judged as abstractly worded, or it is considered an event that the interpretation of a statement is judged as wordy. The co-occurrence of these two events is the observation of an interpretation that is both abstract and wordy. If, in a model, the effects of both Concreteness and Wordiness are taken into account, both the row effects and the column effects are part of the model.
Note that we currently still use the example of an IJ table, which is a 2D table. The models that we discuss in this context can easily be generalized to tables that are spanned by more than two variables. In addition, if both the row and the column effects are taken into account, possible interactions are not (yet) taken into account. Row effects and column effects are termed main effects. Therefore, the model that takes both row and column effects into account is also called the main effect model.
Considering that no statement is made about a possible interaction between the row and the column variables, the main effect model is an independence model. From this, we can specify how to estimate the expected cell probabilities. We obtain . This formulation reflects (1) both main effects and (2) the assumption of independence of the row variable from the column variable.
Readers will notice that the cell frequencies for the independence model are estimated just as for the good old Pearson X2-test. In fact, the Pearson X2-test of the association between two variables is identical to the X2-test of the log-linear model of variable independence. If this model is rejected, which is the case when the X2 indicates significant model–data discrepancies, the two variables are associated. This interpretation is possible because the association (interaction) of the row variable with the column variable is the only effect that is not taken into account when the expected cell frequencies are estimated.
Table 2.4 shows that, on average, the standardized residuals are smaller than for any of the simpler models discussed in the last sections. None of the z-scores is above 1.96. If this model describes the data well, the two variables that span the table can be assumed to be independent, because the discrepancies between the estimated expected frequencies, which conform to the independence model, and the observed frequencies are no larger than random. If, in contrast, this model must be rejected, an association can be assumed to exist. Taking into account scale characteristics of the data can reduce the discrepancies. If this is not enough for the main effect model to survive, an association must exist. This issue will be discussed in more detail later, in Section 9.3.7.
Table 2.4 Cross-classification of Concreteness and Wordiness of the Interpretations of Statement 8; Expected Frequencies Estimated for the Main Effect Model
In this section, we move from the probability notation used in the last sections to the log-linear notation. The log-linear notation has several advantages. First, it makes it easier to identify models as members of the family of generalized linear models (see Section 6.1). Second, the models have a form parallel to the form used for analysis of variance models. Therefore, their form facilitates intuitive understanding for readers familiar with analysis of variance. Third, log-linear models mostly contain additive terms. These are easier to read and interpret than the multiplicative terms used in the last section. This applies, in particular, when there are many variables and a model becomes complex. Fourth, the relationship between log-linear models and odds ratios can easily be shown (Section 4.3).
To introduce the log-linear form of the models considered in this book, consider the independence model introduced in Section 2.4, with . The expected cell frequencies for this model were estimated from the data as which is equivalent to . As before, let the first variable be denoted by A, and the second variable by B (the order of these variables is, in the present context, of no importance). Taking the natural logarithm of yields
which we reparameterize as
that is, the log-linear form of the main effect model of variable independencel. Using the log-linear form, it is easy to also represent the null model, the model that only takes the main effect of variable A into account, and the model that only takes the main effect of variable B into account. Specifically, the null model is
The main effect A-only model is
and the main effect B-only model is
As we said above, the model that takes both main effects into account is
Using the log-linear form, one can specify a model that also takes into account the association between Variables A and B. This model is
where the last term represents the interaction between A and B. This term is based on interaction variables that are created in a multiplicative way comparable to interaction variables in analysis of variance.
For the two variables A and B, the model that takes both the main effects and the interaction between the two variables into account, is the saturated model. Saturated models have zero degrees of freedom. All possible hierarchical effects are taken into account.2
In each of these models, the λ’s are the model parameters that are estimated from the data. The number of these parameters can be large. Consider the saturated model for a 2 ×2 table, that is, a table with four cells. This model comes with the following nine model parameters: , and .
Obviously, there are more parameters than probabilities. The model is overparameterized. There are four equations (cells) and nine unknowns. To solve this problem, parameters are constrained. The most popular approach to constraining variables is the so-called “sum-to-zero” approach. This approach constrains parameters for each model term to sum to zero. For I × J tables, the constraints are
In explicit form, these constraints imply, for the current example with a 2 × 2 table, that
As Table 2.5 shows, the majority of the patients who received instant treatment improved. The majority of the patients on the waiting list showed no improvement. We now estimate two models. The first is the saturated model for the cross-classification in Table 2.5. Specifically, we estimate the model
Table 2.5 Cross-classification of Treatment Group with Outcome
The parameters for this model are estimated under the sum-to-zero constraints. The overall X2 for the saturated model is, naturally, zero, and there are no degrees of freedom left. For the parameters, we estimate
For each contrast, only one significance test is performed. Dichotomous variables involve only one contrast, and so do interactions of dichotomous variables, of any order. The parameter estimates fulfill the zero-sum constraints posed above. For each effect, the sum of the parameter estimates is zero. In addition, the sums of the interaction parameters over the rows are zero and so are the sums over the columns. Substantively, we note that the treatment had the desired effect. Significantly more patients in the treatment group showed improvement than in the waiting group.
The second model is the main effect model
For the parameters, we estimate
As for the saturated model, none of the main effect parameters is significant. The result that the model is rejected allows one to conclude that Treatment Group and Outcome are associated.
1Note that the natural logarithm often is abbreviated using ln(·) instead of log(-). In parts of the literature (e.g., Agresti [2]; Christensen [31]; Lawal [129]), log(·) is used; in other parts (e.g., Fleiss, Levin, & Paik [69]), ln(·) is used. In this text, to be consistent with the parts of the literature listed above, and our own earlier notation (von Eye, Mair, & Mun [242]), we use log().
2Note that models with zero degrees of freedom can also be created using lower order terms plus special effects or covariates.
A key step in modeling involves the evaluation of model performance. To decide whether a model can be retained or must be modified or rejected, fit information is created. This information concerns various aspects of the relationship between model and data. Specifically, one can ask whether a model, overall, describes the data well. To answer this question, overall goodness-of-fit statistics and R2 equivalents are calculated. Second, one can ask whether individual parameters differ significantly from zero. Third, one can perform residual analysis. The following sections cover each of these topics. We begin with overall goodness-of-fit statistics.
Goodness-of-fit statistics, occasionally also called badness-of-fit statistics, assess the distance between the observed distribution and the distribution that a model proposes. Under the null hypothesis, this distance is only randomly larger than zero. Using a significance test, researchers estimate the probability that the observed frequencies, or frequencies with even larger distances from those estimated, based on the model occur. This probability is termed the size of the test.
Many tests have been proposed for the evaluation of model performance. Some of these tests are exact; others are asymptotic. Most software packages offer exact tests only for small tables, such as 2 x2 tables, and asymptotic tests for log-linear modeling. Therefore, we focus here on asymptotic tests (von Eye, Bogat, & von Weber [231]).
A most general indicator of goodness-of-fit is Cressie and Read’s [42] and Read & Cressie [186] power divergence statistic,
where λ is a real-valued parameter3, with –∞ < λ < ∞. The index i goes over all cells of a table, mi is the observed frequency of cell i, and is the expected frequency of cell i.
This statistic is important because it can, by way of selecting particular values for λ, be shown that well-known measures of goodness-of-fit are special cases (see Agresti [2]). Specifically,
Comparisons of the two best known of these six statistics, the Pearson X2, and the likelihood ratio G2, have shown that Pearson’s X2 statistic is often closer to the χ2 distribution than G2 (see Koehler & Larntz [117]; Larntz [127]). However, G2 has better decomposition characteristics than X2. Therefore, the decomposition of the effects in cross-classifications (Rindskopf [189,191]; see Section 11.2) and the comparison of hierarchically related log-linear models are typically performed using the G2 statistic.
There exists a number of other goodness-of-fit tests. These include, for instance, the Kolomogorov–Smirnoff test, the Cramer–von Mises test, and runs tests. In this book, we focus on Pearson’s X2 and on G2 because these are the most frequently used tests. In addition, these are the only tests available in the log-linear modeling modules in most general purpose statistical software packages.
One issue that was discussed already in Section 1.4 concerns the sample size dependence of the χ2 approximations. Here, we resume this discussion, and illustrate two characteristics. First, we show that the magnitude of Pearson’s X2 is directly related to the sample size. Second, we show that Pearson’s X2 and G2 respond differently to very small expected cell frequencies and to observed cell frequencies that are zero.
Pearson’s X2 is usually described using the equation given above,
where i goes over all cells included in the summation, that is, typically, all cells of a cross-classification. Since the sample size is a part of each term in the summation, the following formulation is equivalent:
Figure 3.1X2 and G2 simulation for small expected frequencies.
where πi is the cell probability of cell i, and is the expected probability of cell i. This equation shows that, when the relationships in a table remain unchanged, and the sample size increases by a multiple of N, Pearson’s X2 will increase by this multiple. This applies accordingly to the likelihood ratio G2:
The implications of this sample size dependency are obvious. When the sample size increases, the statistical power to reject a model will increase as well. With very large samples, it will be almost impossible to find a model that can be retained, and it becomes more and more likely that sample characteristics are made part of a fitting model that then cannot be generalized. Conversely, when a sample becomes smaller, it will be harder and harder to reject a model, for lack of statistical power. When individual expected cell frequencies become too small (see Section 1.4), however, an inflation of Pearson’s X2 can occur, and the approximation characteristics of X2 suffer.
Figure 3.1 shows that, as the expected frequency, decreases and approaches mi, the test statistics will, for constant mi, decrease also, until . After that point, both X2 and G2
