35,99 €
Categorical Statistics for CommunicationResearch presents scholars with a discipline-specific guide to categorical data analysis. The text blends necessary background information and formulas for statistical procedures with data analyses illustrating techniques such as log- linear modeling and logistic regression analysis. * Provides techniques for analyzing categorical data from a communication studies perspective * Provides an accessible presentation of techniques for analyzing categorical data for communication scholars and other social scientists working at the advanced undergraduate and graduate teaching levels * Illustrated with examples from different types of communication research such as health, political and sports communication and entertainment * Includes exercises at the end of each chapter and a companion website containing exercise answers and chapter-by-chapter PowerPoint slides
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 460
Veröffentlichungsjahr: 2016
Cover
Title Page
Preface
References
Acknowledgments
About the Companion Website
1 Introduction to Categorical Statistics
Historical Overview
Probability Distributions and Parameter Estimation
Example of Maximum Likelihood Estimation
A Note on Statistical Software
Chapter Summary
Chapter Exercises
References
2 Univariate Goodness of Fit and Contingency Tables in Two Dimensions
Chi‐Square Test for Goodness of Fit
Chi‐Square Test of Independence in Contingency Tables
Likelihood Ratio Statistic
Exact Tests for Small Samples
McNemar’s Test for Correlated Samples
Measures of Association
Odds Ratio
Relative Risk
Phi Coefficient
Cramér’s V
Pearson’s Contingency Coefficient
Kendall’s Tau
Goodman and Kruskal’s Gamma
Somers’
d
Points of Concern in Bivariate Analyses
SPSS Analyses
Testing Goodness of Fit in SPSS
Testing Independence in SPSS
A Note on Style
Chapter Summary
Chapter Exercises
References
3 Contingency Tables in Three Dimensions
Moving from Two to Three Dimensions
Cochran‐Mantel‐Haenszel Test
Breslow‐Day Test
An Example in Public Health
An Example in Political Communication
Chapter Summary
Chapter Exercises
References
4 Log‐linear Analysis
Development of Log‐linear Models
Examples of Published Research
Log‐linear Analysis: Fundamentals
Two‐way Tables
Three‐way Models
Goodness of Fit and Model Selection
Descriptive Statistics and Residuals for the Fitted Model
Parameter Estimation
Ordinal Log‐linear Analysis
Three Ordinal Measures
More Complex Models
Visual Displays
Chapter Summary
Chapter Exercises
References
5 Logit Log‐linear Analysis
Examples of Published Research
Logit Log‐linear Analysis: Fundamental Components
Logit Model with One Response Measure
Logit Model with Two Response Measures
SPSS Example
Correspondence Analysis
Chapter Summary
Chapter Exercises
References
6 Binary Logistic Regression
Examples of Published Research
Binary Logistic Regression: Fundamentals
Simple Logistic Regression Analysis
Multiple Logistic Regression Analysis
Interactions
Model Assessment
Additional Statistics
Diagnostic Considerations
Binary Logistic Regression in SPSS
Chapter Summary
Chapter Exercises
References
7 Multinomial Logistic Regression
Examples of Published Research
Multinomial Logistic Regression: Fundamentals
Simple Multinomial Logistic Regression Analysis
Multiple Multinomial Logistic Regression Analysis
Conditional Logit Modeling
Multinomial Logistic Regression in SPSS
Chapter Summary
Chapter Exercises
References
8 Ordinal Logistic Regression
Examples of Published Research
Ordinal Logistic Regression: Fundamentals
Simple Ordinal Logistic Regression Analysis
Multiple Ordinal Logistic Regression Analysis
Interactions
Ordinal Logistic Regression in SPSS
Chapter Summary
Chapter Exercises
References
9 Probit Analysis
Examples of Published Research
Probit Analysis: Fundamentals
Binary Probit Analysis
Ordinal Probit Analysis
Multinomial Probit Analysis
Interactions
Chapter Summary
Chapter Exercises
References
10 Poisson and Negative Binomial Regression
Examples of Published Research
Poisson Regression: Fundamentals
Negative Binomial Regression: Fundamentals
Additional Techniques
SPSS Analyses
Chapter Summary
Chapter Exercises
References
11 Interrater Agreement Measures for Nominal and Ordinal Data
Analysis of Nominal Data with Two Raters
Analysis of Nominal Data with Multiple Raters
Analysis of Ordinal Data with Two Raters
Analysis of Ordinal Data with Multiple Raters
Kappa Coefficient in SPSS
Intraclass Correlation Coefficients in SPSS
Chapter Summary
Chapter Exercises
References
12 Concluding Communication
References
Appendix A: Chi‐Square Table
Appendix B: SPSS Code for Selected Procedures
Index
End User License Agreement
Chapter 01
Table 1.1 Example of cross‐classifications containing nominal and ordinal measures
Chapter 02
Table 2.1 Data for Title IX goodness‐of‐fit test
Table 2.2 Cross‐tabulation of time period by drug‐use mentions in horse‐racing reports
Table 2.3 Cross‐tabulation of time period by drug‐use mentions in horse‐racing reports on NPR
Table 2.4 Cross‐tabulation of paired attitudes
Table 2.5 Cross‐tabulation of communication with parents about drug dangers and experimenting with marijuana
Table 2.6 Cross‐tabulation of race and contacting a public official during previous year
Table 2.7 Cross‐tabulation of race and perceptions of party most capable of managing economy
Table 2.8 Cross‐tabulation of television and newspaper exposure during 2008 election campaigns
Table 2.9a Calculations of concordant and discordant pairs
Table 2.9b Calculation of corrections for ties
Table 2.10 Display of SPSS cross‐tabulation and chi‐square statistics
Chapter 03
Table 3.1 Cross‐tabulations of time period by horse injury or death mentions in two newspapers
Table 3.2 Cross‐tabulations of sex and marijuana experimentation with race as control measure
Table 3.3 Cross‐tabulations of sex and political discussion with race as control measure
Table 3.4 Odds ratios reported in Risk function in SPSS
Table 3.5 Select results of Breslow‐Day and Cochran‐Mantel‐Haenszel tests in SPSS
Chapter 04
Table 4.1 General log‐linear analyses of sex, race, and political leaning
Table 4.2 Descriptive statistics for log‐linear model containing sex, race, and political leaning
Table 4.3 Parameter estimates for log‐linear model containing sex, race, political leaning, and interaction of race and political leaning
Table 4.4 Cross‐tabulation of race and political leaning
Table 4.5 Parameter estimates for log‐linear model containing newspaper use and alcohol risk
Table 4.6 Parameter estimates for log‐linear model containing newspaper use, alcohol risk, and ordinal association parameter
Table 4.7 General log‐linear analyses of newspaper use, steroid risk perceptions, and participation in school‐sponsored athletics
Table 4.8 Parameter estimates for ordinal log‐linear model containing newspaper use, steroid risk, sports participation, and two interactions
Table 4.9 General log‐linear analyses of sex, race, personal optimism, and national optimism with frequency of political discussion as covariate
Table 4.10 SPSS goodness‐of‐fit display for log‐linear model containing sex, race, personal optimism, national optimism, and political discussion covariate
Table 4.11a Descriptive statistics for log‐linear model containing sex (males), race, personal optimism, national optimism, and political discussion covariate
Table 4.11b Descriptive statistics for log‐linear model containing sex (females), race, personal optimism, national optimism, and political discussion covariate
Table 4.12 SPSS parameter estimates for log‐linear model containing sex, race, personal optimism, national optimism, and political discussion covariate
Chapter 05
Table 5.1 Logit log‐linear models including sex, race, personal optimism, and frequency of political discussion as explanatory measures of national optimism
Table 5.2a Descriptive statistics for logit log‐linear models including sex (males), race, personal optimism, and frequency of political discussion as explanatory measures of national optimism
Table 5.2b Descriptive statistics for logit log‐linear models including sex (females), race, personal optimism, and frequency of political discussion as explanatory measures of national optimism
Table 5.3 Constant estimates for logit log‐linear model including sex, race, personal optimism, and frequency of political discussion as explanatory measures of national optimism
Table 5.4 Parameter estimates for logit log‐linear model including sex, race, personal optimism, and frequency of political discussion as explanatory measures of national optimism
Table 5.5 Logit log‐linear models including sex, race, school suspension, and exposure to antidrug advertising as explanatory measures of alcohol and marijuana disapproval
Table 5.6 Constant estimates for logit log‐linear model containing sex, race, school suspension, and exposure to antidrug advertising as explanatory measures of alcohol and marijuana disapproval
Table 5.7a Parameter estimates for logit log‐linear model including sex, race, school suspension, and exposure to antidrug advertising as explanatory measures of alcohol and marijuana disapproval
Table 5.7b Parameter estimates for logit log‐linear model including sex, race, school suspension, and exposure to antidrug advertising as explanatory measures of alcohol and marijuana disapproval
Table 5.8 SPSS goodness‐of‐fit display for logit log‐linear model containing sex, parental communication, limited television viewing, and teacher encouragement as predictors of attitudes toward peer alcohol consumption
Table 5.9 Cell Counts and Residuals for logit log‐linear model containing sex, parental communication, limited television viewing, and teacher encouragement as predictors of attitudes toward peer alcohol consumption
Table 5.10 Constant estimates for logit log‐linear model containing sex, parental communication, limited television viewing, and teacher encouragement as predictors of attitudes toward peer alcohol consumption
Table 5.11 Parameter estimates for logit log‐linear model containing sex, parental communication, limited television viewing, and teacher encouragement as predictors of attitudes toward peer alcohol consumption
Chapter 06
Table 6.1 Cross‐tabulation of time period by drug‐use mentions in horse‐racing reports
Table 6.2 Logistic regression model testing time period as determinant of drug‐use mentions in
Albuquerque Journal
Table 6.3 Logistic regression model testing sex, race, and age as determinants of economic attitudes
Table 6.4 Logistic regression model testing sex, race, and age, as well as political party identification, as determinants of economic attitudes
Table 6.5 Logistic regression model testing sex, race, and age, as well as political party identification and exposure to radio news, as determinants of economic attitudes
Table 6.6 Log‐likelihood estimates for three binary logistic regression models
Table 6.7 SPSS output for binary logistic regression model containing categorical predictors
Table 6.8 SPSS output for binary logistic regression model containing categorical and continuous predictors
Chapter 07
Table 7.1 Cross‐tabulation of race by political party affiliation
Table 7.2 Simple multinomial logistic regression model testing race as a determinant of political party affiliation
Table 7.3 Multiple multinomial logistic regression model testing sex, race, military service, and newspaper exposure as determinants of political party affiliation
Table 7.4 SPSS output for multinomial logistic regression model containing categorical predictors
Table 7.5 SPSS output for multinomial logistic regression model containing categorical predictors
Chapter 08
Table 8.1 Cross‐tabulation of sex by marijuana risk perceptions
Table 8.2 Observed data cross‐classification of sex by four levels of risk associated with regular marijuana use: frequency (
f
), proportion (
p
), cumulative proportion (
cp
), cumulative odds
a
(
co
), and Odds Ratios (
OR
)
Table 8.3 Ordinal logistic regression model testing sex as a predictor of marijuana risk perceptions
Table 8.4 Multiple ordinal logistic regression model testing sex, age, and teacher communication about drugs as determinants of marijuana risk perceptions
Table 8.5 Multiple multinomial logistic regression model testing sex, age, and teacher communication about drugs as determinants of marijuana risk perceptions
Table 8.6 SPSS output for ordinal logistic regression model containing categorical predictors
Table 8.7a SPSS frequencies for ordinal logistic regression model containing categorical predictors
Table 8.7b SPSS frequencies for ordinal logistic regression model containing categorical predictors
Chapter 09
Table 9.1 SPSS output for binary probit model in Generalized Linear Models procedure
Table 9.2 SPSS output for binary probit model in PLUM procedure
Table 9.3 SPSS output for ordinal probit regression model
Table 9.4a SPSS frequency output for ordinal probit regression model
Table 9.4b SPSS frequency output for ordinal probit regression model
Chapter 10
Table 10.1 SPSS output for Explore analysis of television news exposure
Table 10.2 SPSS output for Poisson regression model in Generalized Linear Models procedure
Table 10.3 SPSS output for negative binomial model in Generalized Linear Models procedure
Chapter 11
Table 11.1 Cell frequencies for interrater reliability calculations
Table 11.2 Hypothetical data for Fleiss kappa calculation
Table 11.3 Hypothetical data for weighted kappa calculation
Table 11.4 Example of quadratic weights applied to cells
Table 11.5 Data for calculation of Kendall’s
W
Table 11.6 SPSS output for kappa coefficient
Table 11.7 SPSS output for intraclass correlation coefficient
Chapter 02
Figure 2.1 Display of SPSS Goodness‐of‐Fit windows.
Figure 2.2 Display of SPSS Goodness‐of‐Fit results.
Figure 2.3 Display of SPSS Crosstabs and Cells windows.
Figure 2.4 Display of statistics available in SPSS Crosstabs procedure.
Chapter 03
Figure 3.1 SPSS screenshots for Breslow‐Day and Cochran‐Mantel‐Haenszel tests.
Chapter 04
Figure 4.1 SPSS screenshot of general log‐linear options.
Figure 4.2 SPSS screenshot of log‐linear model construction
Chapter 05
Figure 5.1 Screenshot of SPSS Logit Loglinear Analysis.
Figure 5.2 SPSS map of eleven ordinal variables.
Chapter 06
Figure 6.1 SPSS windows for binary logistic regression and variable definition.
Figure 6.2 SPSS windows for binary logistic regression and analysis options.
Chapter 07
Figure 7.1 SPSS screenshots for variables to be included in multinomial logistic regression model.
Figure 7.2 SPSS screenshots for output options in multinomial logistic regression model.
Chapter 08
Figure 8.1 SPSS screenshots for output options in ordinal logistic regression analysis (PLUM).
Figure 8.2 SPSS screenshots for location options in ordinal logistic regression analysis (PLUM).
Chapter 09
Figure 9.1 SPSS screenshot for Generalized Linear Models.
Figure 9.2 SPSS screenshot for Model design in Generalized Linear Models.
Chapter 11
Figure 11.1 Screenshots for SPSS kappa analysis.
Figure 11.2 Screenshots for SPSS intraclass correlation coefficient.
Cover
Table of Contents
Begin Reading
iii
iv
v
xiii
xiv
xv
xvi
xvii
xviii
xix
xx
1
2
3
4
5
6
7
8
9
11
10
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
134
136
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
211
212
214
215
213
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
253
254
252
255
256
257
258
259
262
263
264
265
266
267
268
269
270
Bryan E. Denham
This edition first published 2017© 2017 John Wiley & Sons, Inc.
Registered OfficeJohn Wiley & Sons, Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK
Editorial Offices350 Main Street, Malden, MA 02148‐5020, USA9600 Garsington Road, Oxford, OX4 2DQ, UKThe Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK
For details of our global editorial offices, for customer services, and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com/wiley‐blackwell.
The right of Bryan E. Denham to be identified as the author of this work has been asserted in accordance with the UK Copyright, Designs and Patents Act 1988.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.
Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. It is sold on the understanding that the publisher is not engaged in rendering professional services and neither the publisher nor the author shall be liable for damages arising herefrom. If professional advice or other expert assistance is required, the services of a competent professional should be sought.
Library of Congress Cataloging‐in‐Publication Data
Names: Denham, Bryan E., 1967– author.Title: Categorical statistics for communication research / Bryan E. Denham.Description: Chichester, UK ; Malden, MA : John Wiley & Sons, 2016. | Includes index.Identifiers: LCCN 2016019969 (print) | LCCN 2016024992 (ebook) | ISBN 9781118927106 (cloth : alk. paper) | ISBN 9781118927090 (pbk.) | ISBN 9781118927083 (pdf) | ISBN 9781118927076 (epub)Subjects: LCSH: Communication–Research–Statistical methods. | Statistics.Classification: LCC P93.7 .D46 2016 (print) | LCC P93.7 (ebook) | DDC 302.23/021–dc23LC record available at https://lccn.loc.gov/2016019969
A catalogue record for this book is available from the British Library.
For all who aspire to theoretically informedand methodologically rigorous quantitative research.
In June 1946, recognizing an impasse among scientists debating measurement strategies in psychology, S. S. Stevens observed that measurement – the assignment of numerals to objects and events according to rules – depended on the scales to which data were assigned. Nominal scales involved the use of numerals as qualitative labels only, “and quite naturally,” Stevens (1946, 679) wrote, “there are many who will urge that it is absurd to attribute to this process of assigning numerals the dignity implied by the term measurement.” Indeed, simple frequency counts offered limited information, and because advanced analytic techniques for nominal data had not been developed, scholars typically examined relationships two variables at a time, in some cases controlling the level of a third measure.
At the next level of measurement, the ordinal level, observations appeared in a ranked sequence. Stevens (1946) cited hardness among minerals as an example, emphasizing that while order did exist, one could not assume equal intervals between observations; the interval between topaz and corundum, for instance, might not equal the interval between corundum and diamond. “In the strictest propriety,” Stevens (1946, 679) cautioned, “the ordinary statistics involving means and standard deviations ought not to be used with these scales, for these statistics imply a knowledge of something more than the relative rank‐order of the data.” In other words, summing a set of scores and dividing by the number of observations could yield a distorted average; the median, or exact middle score, served as a more appropriate measure. Nevertheless, like prominent statisticians who would follow, Stevens did not advocate the wholesale elimination of mean scores at the ordinal level, opting only to state that inaccuracies stood to increase as differences among intervals did the same.
For Stevens, data became “quantitative” at the interval level of measurement. Here, means and standard deviations could be computed without qualification, based on assumptions of equal intervals among observations. Centigrade and Fahrenheit temperature scales served as examples of interval measures, to be followed by a fourth and final level of measurement, the ratio scale, which contained a point of absolute zero in addition to equal intervals. Periods of time, Stevens wrote, could be measured on a ratio scale, as one could observe a period that was twice as long as another. In contrast, it made little sense to assert that a temperature of 70 degrees Fahrenheit was twice 35‐degree weather.
In the years since Stevens (1946) described the four levels of measurement, statisticians have generally referred to data measured at the nominal and ordinal levels as categorical while referring to data measured at the interval and ratio levels as continuous; the current text focuses on the former. While scholars in social‐science fields such as economics, political science, psychology, and sociology have written monographs and longer books addressing the analysis of nominal and ordinal data, communication scholars have lacked a text on which to draw in conducting studies and teaching quantitative research methods. Designed for graduate students in communication as well as faculty members and research professionals in the public and private sectors, Categorical Statistics for Communication Research seeks to fill a disciplinary void by presenting communication scholars with a discipline‐specific guide to categorical data analysis. In that sense the book seeks to complement statistics texts by Hayes (2005), Reinard (2006), and Weber and Fuller (2013). Their texts contain excellent instruction on techniques such as the analysis of variance (ANOVA) and ordinary least squares (OLS) regression, but the books do not address advanced approaches for analyzing categorical data. In covering advanced techniques in categorical statistics, the present text assumes the reader will have completed an undergraduate course addressing the fundamentals of quantitative research methods. Such a course may have followed one of the texts mentioned above, or perhaps one from Babbie (2015), Keyton (2014), or Wimmer and Dominick (2014).
At the graduate level, communication seminars on quantitative methods tend to focus on techniques that assume interval‐level response variables. Following discussions of descriptive statistics and measures of central tendency, instruction often focuses on the t‐test and one‐way analysis of variance before moving to correlation tests, factorial ANOVA and ordinary least squares regression. Advanced topics include techniques such as structural equations and hierarchical linear modeling (see Hayes, Slater, and Snyder 2008). In contrast, instruction on categorical statistics tends to begin and end with cross‐tabulation and chi‐square analysis; techniques for the simultaneous analysis of multiple categorical variables often receive little, if any, attention. In addressing such techniques, the current text aspires to the following objectives:
To provide an accessible guide to the use of categorical statistics, blending necessary background information and formulas for statistical procedures with data analyses illustrating the respective techniques;
To include examples from multiple areas of the communication discipline;
To demonstrate how techniques discussed in the book can be applied to data gathered through surveys, content analyses, and other methods;
To offer useful instructions for categorical data analyses in IBM SPSS
®
;
To demonstrate how procedural assumptions – and problems with meeting those assumptions – can offer substantive insight into communication processes;
To address points of methodological debate in an even‐handed manner, identifying approaches within and between areas of study;
To include a significant number of references for readers seeking additional background information about the techniques addressed.
To meet these objectives, the text begins with an introduction to categorical data analysis, reviewing statistical terminology and the assumptions statisticians have made in developing bivariate and multivariate tests. As the chapter explains, where techniques such as ANOVA and OLS regression assume a normal probability distribution, modeling procedures covered in the current text assume Poisson, binomial, and multinomial distributions, making the techniques comparably robust to non‐normal data. Additionally, modeling techniques covered in the text use maximum likelihood estimation (MLE), as opposed to least squares (LSE), in parameterization processes. Because MLE tends to be less biased with large samples (Nunnally and Bernstein 1994), procedures addressed in the book can prove valuable for studies that draw on large public datasets.
Chapter 2 addresses univariate goodness of fit and bivariate tests of independence and association. The chapter focuses on the use of chi‐square to assess proportions in the categories of a single variable and independence in contingency tables containing two measures. In doing so, the chapter includes examples from recent content analyses and survey research initiatives, also reviewing measures of association and the likelihood ratio statistic. Regarding terminology, readers may recognize chi‐square analysis as a popular nonparametric, “distribution‐free” technique for comparing observed and expected frequencies in cross‐tabulations (see Conover 1999, Siegel 1956). Absent a point of reference, scholars sometimes regard categorical statistics, in general, as nonparametric; however, as indicated in the previous paragraph, most of the categorical models in this text assume an established distribution. As Anderson and Philips (1981) pointed out, such models focus on parameter estimation and travel beyond mere significance testing. In short, categorical statistics should not be confused with distribution‐free, nonparametric procedures such as the Kruskal‐Wallis nonparametric ANOVA test or Spearman correlation analysis.
Chapter 3 moves from two‐dimensional contingency tables to analyses containing three categorical variables. Analyses of three‐dimensional tables involve testing relationships between two measures at a fixed level of a third. As the chapter demonstrates, the Breslow‐Day (B‐D) and Cochran‐Mantel‐Haenszel (C‐M‐H) tests facilitate comparisons of odds ratios and allow researchers to gather information about three‐way tables in an efficient manner. The B‐D and C‐M‐H tests have been applied primarily in studies of health communication, but scholars working in other areas also may find the procedures useful.
Chapter 4 focuses on log‐linear modeling, a technique used to examine contingency tables in more than two dimensions. Unlike logit log‐linear analysis, addressed in Chapter 5, general log‐linear models do not recognize differences between explanatory (independent) and response (dependent) measures; rather, analyses treat all variables as outcomes, modeling the natural logs of cell frequencies. Researchers who use log‐linear analysis generally seek to remove parameters from a saturated model, which contains all effects but 0 degrees of freedom, toward a more parsimonious representation of the observed data. Scholars who use logit log‐linear models also seek to identify parsimonious relationships, but they do so with a “categorical variable analog” (Knoke and Burke 1980, 25) to ordinary least squares regression. As Chapter 5 explains, logit log‐linear models estimate the log odds of a response measure as a function of explanatory variables, and the model also allows more than one dependent variable to be included in a given analysis. In that sense, the logit procedure bears some similarity to the multivariate analysis of variance, which allows more than one response measure to be included in a model.
Chapter 6 addresses binary logistic regression, a technique used in analyses containing a dichotomous dependent variable (e.g., whether or not an individual communicated with an elected representative in the previous 12 months). Logistic regression accommodates categorical and continuous explanatory measures and produces parameter estimates that can be exponentiated to form odds ratios. Chapter 7 covers multinomial logistic regression, which researchers use when a categorical dependent variable contains more than two levels. As an example, scholars of political communication might study predictors of national optimism, with a response measure indicating that survey respondents (a) appeared optimistic about the future of the nation, (b) appeared pessimistic, or (c) appeared neither optimistic nor pessimistic. Although the multinomial procedure treats a response measure as nominal, the technique often proves useful when ordinal logistic regression models, addressed in Chapter 8, do not meet assumptions. As its name implies, the ordinal model analyzes predictors of ordered response measures, which often appear in the form of Likert attitude statements. Researchers may ask study participants to indicate whether they Strongly Agree, Agree, are Undecided, Disagree, or Strongly Disagree that a social protest received fair treatment in the press. While many researchers would treat such a variable as quasi‐interval, Likert statements are technically ordinal measures.
Chapter 9 focuses on probit analysis, a technique similar to logistic regression. As the text explains, the binary probit model assumes an underlying, normally distributed, latent continuous measure. This assumption makes the probit model useful in studies involving issues such as gun control, attitudes toward which are more complex than simple for‐or‐against binaries. Probit analyses contain multinomial and ordinal approaches as well.
Chapter 10 addresses Poisson and negative binomial regression, two techniques used in analyses of count data (i.e., discrete units observed in a given period of time). A communication scholar might use the procedures in studying whether a number of “tweets” posted about a certain topic vary by region of the country and the gender of social media users. If the scholar coded tweets for a subjective measure, such as tone, he or she would need to measure interrater reliability, which the current text covers in Chapter 11. This chapter contains reliability formulas and examples for both nominal and ordinal content variables, explaining how reliability testing advances a study from personal belief to social science, facilitating replication in the process.
In discussing statistical procedures, the text draws on a content analysis published in Journalism & Mass Communication Quarterly (Denham 2014) as well as three datasets made available by the Inter‐university Consortium for Political and Social Research (ICPSR) at the University of Michigan. The datasets include the 2008 American National Election Study (The American National Election Studies 2008), the 2011 National Survey on Drug Use and Health (United States Department of Health and Human Services 2011), and the 2012 Monitoring the Future study of American youth (Johnston, Bachman, O’Malley, and Schulenberg 2012). Examples illustrate procedures through topics in political and health communication as well as other areas in the communication discipline.
Regardless of the topics communication scholars engage, quantitative research studies invariably contain nominal and ordinal variables. Categorical Statistics for Communication Research seeks to enhance the measurement of these variables in statistical systems, contributing both theoretically and methodologically to disciplinary research.
Bryan DenhamJuly 2016Clemson, SC
Anderson, J. A., and P. R. Philips. 1981. “Regression, Discrimination and Measurement Models for Ordered Categorical Variables.”
Applied Statistics
, 30: 22–31.
Babbie, Earl. 2015.
The Practice of Social Research
, 14th ed. Boston, MA: Cengage Learning.
Conover, W. J. 1999.
Practical Nonparametric Statistics
, 3rd ed. New York: John Wiley & Sons, Inc.
Denham, Bryan E. 2014. “Intermedia Attribute Agenda‐Setting in the New York Times: The Case of Animal Abuse in U.S. Horse Racing.”
Journalism & Mass Communication Quarterly
, 91: 17–37. DOI:10.1177/1077699013514415.
Hayes, Andrew F. 2005.
Statistical Methods for Communication Science
. Mahwah, NJ: Erlbaum.
Hayes, Andrew F., Michael D. Slater, and Leslie B. Snyder, eds. 2008.
The Sage Sourcebook of Advanced Data Analysis Methods for Communication Research
. Thousand Oaks, CA: Sage.
Johnston, Lloyd D., Jerald G. Bachman, Patrick M. O’Malley, and John Schulenberg. 2012. Monitoring the Future: A Continuing Study of American Youth. Funded by National Institute on Drug Abuse. Institute for Social Research, University of Michigan.
Keyton, Joann. 2014.
Communication Research: Asking Questions, Finding Answers
, 4th ed. New York: McGraw‐Hill.
Knoke, David, and Peter J. Burke. 1980.
Log‐linear Models
. Beverly Hills, CA: Sage.
Nunnally, Jum C., and Ira H. Bernstein. 1994.
Psychometric Theory
, 3rd ed. New York: McGraw‐Hill.
Reinard, John C. 2006.
Communication Research Statistics
. Thousand Oaks, CA: Sage.
Siegel, S. 1956.
Non‐parametric Statistics for the Behavioural Sciences
. New York: McGraw‐Hill.
Stevens, S. S. 1946. “On the Theory of Scales of Measurement.”
Science
, 103: 677–680.
The American National Election Studies. 2008. American National Election Study: ANES Pre‐ and Post‐Election Survey. ICPSR25383‐v2. Ann Arbor, MI: Inter‐university Consortium for Political and Social Research [distributor], 2012‐08‐30. DOI:10.3886/ICPSR25383.v2.
United States Department of Health and Human Services. 2011. Substance Abuse and Mental Health Services Administration. Center for Behavioral Health Statistics and Quality. National Survey on Drug Use and Health, 2011. ICPSR34481‐v2. Ann Arbor, MI: Inter‐university Consortium for Political and Social Research [distributor], 2013‐06‐20.
Weber, Rene, and Ryan Fuller. 2013.
Statistical Methods for Communication Researchers and Professionals
. Dubuque, IA: Kendall Hunt.
Wimmer, Roger D., and Joseph R. Dominick. 2014.
Mass Media Research: An Introduction
, 10th ed. Boston, MA: Cengage Learning.
Eight reviewers evaluated the proposal for this text, and I thank each of them for their comments and suggestions regarding structure and content. I also recognize three reviewers who examined the initial draft of Categorical Statistics for Communication Research; their feedback helped me to clarify and improve chapter contents, and I very much appreciate their attention to detail.
Elizabeth Swayze provided initial guidance on this project, and I thank her for the support and encouragement. I also thank executive editor Haze Humbert as well as Patrick Wright, who has overseen marketing efforts. I recognize Mary Hall, Julia Kirk, Aneetta Antony, Joanna Pyke, and Roshna Mohan, each of whom contributed to the production of the book. I feel fortunate to have secured a contract with Wiley Blackwell, for in addition to publishing the scholarly journals of the International Communication Association, the company has published leading texts in applied statistics.
I thank the Inter‐university Consortium for Political and Social Research (ICPSR) at the University of Michigan for allowing me to demonstrate statistical techniques with data gathered in survey projects such as the American National Election Studies, the National Survey on Drug Use and Health, and the Monitoring the Future study of American youth. I also thank IBM for granting me permission to include screenshots of SPSS for Windows procedures.
Clemson University granted me a sabbatical to pursue this project, and I appreciate that investment in my work. I also acknowledge Clemson Libraries for maintaining electronic subscriptions to key scholarly journals and for retaining classic texts in categorical statistics. I acknowledge my doctoral adviser, M. Mark Miller, PhD, who encouraged me to pursue a minor in applied statistics. In graduate seminars and in dissertation meetings, Dr Miller stressed the importance of theory in quantitative research, and I certainly share his sentiments regarding theoretically informed social science. I also thank Tony Rimmer, PhD, my MA adviser, for introducing me to quantitative research methods at the graduate level.
Finally, I thank students, colleagues, and family members for expressing an interest in the project, offering encouragement, and sending along news items and other materials addressing scientific studies and quantitative research methods.
This book is accompanied by a companion website:
www.wiley.com/go/denham/categorical_statistics
The website includes:
Data files for chapter exercises
Answers to chapter exercises
Chapter PowerPoint slides
This text focuses principally on the analysis of nominal and ordinal data. Nominal measures contain unordered categories while ordinal variables contain categories in a sequence; both types of measures appear frequently in communication research. At the nominal level, news texts may or may not mention specific issue attributes, and during election years, individuals may or may not view a debate, campaign for a candidate, or vote in a primary. Individuals may be male or female, and they may or may not have served in the military. In addition to these dichotomous measures, unordered polytomous variables include items such as race, religion, and marital status, each of which contains more than two categories. At the ordinal level, attitude statements frequently include five response options: Strongly Agree, Agree, Undecided, Disagree, and Strongly Disagree. Estimations of risk may range from No Risk to Great Risk, and individuals responding to policy decisions may range from Strongly Approve to Strongly Disapprove in their reactions.
Statistician Alan Agresti (1990) mentioned two additional types of categorical data: discrete interval and grouped interval. Discrete interval measures often contain a limited number of values, and because they take the form of integers – and integers only – they are not treated as continuous quantitative measures, which can take on any real value. As an example of discrete interval data, a college dean might record the number of people who earn a graduate degree in communication each year, with recipients constituting discrete units. Regarding grouped interval data, researchers sometimes combine continuous interval measures into ordered brackets, as in the case of income, where asking a survey respondent for a specific figure might be considered both invasive and unnecessary. As a second example, while news reports about a given subject might average 731 words, a researcher might be interested in the number of articles that appear in ordered increments of 250 words.
In covering techniques for analyzing both ordered and unordered categorical variables, the current text recognizes that statisticians have differed in their assumptions and approaches to categorical data analysis. As Powers and Xie (2000) explained, one school of thought considers categorical data part of an underlying continuous distribution, while a second perspective considers categorical data inherently categorical. In historical terms, Agresti (1990) explained that Karl Pearson (1900), who developed the chi‐square goodness‐of‐fit test, assumed continuous distributions underlying categorical variables, while one of Pearson’s contemporaries, George Udny Yule (1900), believed that certain types of variables were inherently categorical and did not require assumptions of underlying distributions. Fienberg (2007) observed merit in both perspectives, noting that Pearson and Yule, along with R. A. Fisher (1922a, 1922b), played significant roles in building a foundation for the development of more advanced analytic techniques (see, for additional history, Fienberg and Rinaldo 2007, Plackett 1983). Interestingly, several decades would pass before statisticians developed advanced procedures for categorical data analysis. Most of the modeling techniques covered in the current text emerged after 1960, whereas statisticians had developed multivariate tests for continuous data decades earlier.
Seminal research in communication (e.g., Lazarsfeld, Berelson, and Gaudet 1948) demonstrates how social scientists analyzed and displayed categorical data. Lacking advanced statistical procedures, researchers typically presented data in the form of frequency charts and cross‐tabulations. As an example, Table 1.1 contains data gathered in the 1948 election year and published in Voting: A Study of Opinion Formation in a Presidential Election (Berelson, Lazarsfeld, and McPhee 1954, 243). The table contains both nominal and ordinal frequency measures and offers descriptive information in a limited but effective manner. Recognizing a pattern between exposure to mass media and level of interest in the presidential election, the authors reported demographic and psychographic information about 814 individuals in Elmira, New York. In the table, numbers appearing in parentheses indicate cell frequencies while figures outside the parentheses indicate the percentage of individuals in each cell who were exposed to media at “High and High‐Middle” levels (N = 432). This approach allowed readers, if so inclined, to calculate the number of respondents in each cell who scored “Low and Low‐Middle” on exposure indices (N = 382), all the while inspecting results across three levels of campaign interest. The use of percentages for “High and High‐Middle” media users allowed the authors to show statistical patterns that raw cell frequencies would have obscured. Examining the table, one observes that individuals exposed the most to mass media and interested the most in the election belonged to more organizations, had higher levels of education, and appeared in higher socioeconomic classes.
Table 1.1 Example of cross‐classifications containing nominal and ordinal measures
Percentage with High or High‐Middle Exposure (on Index)
Level of Interest
Characteristics
Great Deal
Quite a Lot
Not Much at All
(a) Organization Membership:
Belongs to Two or More
82 (103)
68 (87)
39 (64)
Belongs to One
72 (71)
57 (74)
34 (68)
Belongs to None
62 (100)
47 (112)
24 (126)
(b) Education:
College
88 (58)
62 (37)
48 (25)
High School
71 (166)
60 (171)
30 (152)
Grammar School or Less
56 (48)
45 (62)
25 (81)
(c) Socioeconomic Status:
Higher
79 (167)
63 (120)
39 (105)
Lower
60 (108)
52 (153)
25 (154)
(d) Sex:
Men
72 (122)
60 (124)
38 (110)
Women
71 (153)
54 (149)
25 (149)
(e) Neuroticism:
Low
77 (112)
64 (106)
30 (100)
High
67 (149)
50 (147)
30 (138)
Note: Table appeared originally in Berelson, Lazarsfeld, and McPhee (1954), Voting: A Study of Opinion Formation in a Presidential Election. © 1954 by The University of Chicago. Reprinted with permission, University of Chicago Press.
Readers familiar with significance testing may notice that Table 1.1 does not contain chi‐square analyses, commonly used to determine whether significant differences exist between observed and expected cell frequencies. Lazarsfeld, a research methodologist, did not consider it appropriate to test bivariate relationships for statistical significance, reasoning that additional variables could alter – or eliminate – significant relationships.1 As indicated, when Lazarsfeld and his colleagues conducted their election research, multivariate techniques for categorical data had not been developed. For example, log‐linear modeling, which examines associations among multiple categorical variables simultaneously, did not exist as such; had the technique been available, Lazarsfeld and other researchers may have used it in analyzing frequency data. In fact, Alwin and Campbell (1987, S147) described log‐linear models as, “in many ways, the culmination of the classic Lazarsfeldian tradition. They relate to it directly, rather than obliquely. They focus on tables, the basic building blocks of survey analysis, and they provide precise tests of simple and complex versions of partialling and elaboration hypotheses.” Indeed, where Pearson and Yule worked with 2 x 2 contingency tables (i.e., cross‐tabulations in which both variables contained two categories), statisticians who developed log‐linear models (see Goodman 1978) established approaches for the simultaneous analysis of more than two variables, each of which may have contained more than two categories.
In addition to log‐linear modeling, the current text also addresses binary, multinomial, and ordinal logistic regression analyses. As with ordinary least squares (OLS) regression, logistic models examine the effects of one or more independent (explanatory, predictor) variables on a single dependent (response, outcome) measure.2 Like log‐linear models, logistic regression techniques belong to a special class of generalized linear models (GLMs), developed by Nelder and Wedderburn (1972). As explained in Chapter 4 of the current text, a GLM contains a systematic and a random component as well as a link function. Explanatory variables form the systematic component, while a dependent measure and the probability distribution assigned to it constitute the random component (Agresti 2007, 66–67; see also, McCullagh and Nelder 1989). Link functions connect the systematic and random components.
In the case of log‐linear and logistic regression techniques, the link function transforms a response measure, such that the dependent variable can be modeled as a linear function of explanatory measures. In OLS regression, a transformation is not necessary, as the procedure models the mean of a dependent variable directly, using an identity link. Log‐linear analysis, which models cell frequencies, uses a log link function, while logistic regression analysis, which models a response measure containing a value between 0 and 1 (e.g., a probability), uses the log of the odds. Statisticians who developed logistic regression models (e.g., Cox 1958, McCullagh 1980) built on the work of individuals such as Chester Bliss (1935), who popularized the probit model, and Joseph Berkson (1944), who applied the term logit to log odds.3
Because advanced modeling techniques for categorical data facilitate the simultaneous examination of multiple variables, they can help to lower the risk of Type I error, or a false rejection of the null hypothesis. When a researcher conducts multiple bivariate analyses using the same set of data, he or she increases the likelihood of identifying “significant” relationships that may be little more than chance occurrences. Yet, legitimate relationships can be rejected when analyses are too conservative; in such cases, Type II error – a failure to reject the null hypothesis when it should be rejected – can occur.4 As the current text observes, examining multiple variables simultaneously offers an appropriate balance for controlling the two types of error – provided statistical tests meet their assumptions.
Categorical statistics, in general, assume independence among observations, and when that assumption is violated, artificial inflation of a sample may occur, leaving statistical tests technically flawed. The chi‐square test statistic, in particular, is sensitive to sample size, and a lack of independence among observations will almost certainly compromise a study. As an example, while a researcher might content analyze 84 individual news reports, a statistician would not consider sentences or paragraphs within those reports independent units. Relatedly, categories within variables should be mutually exclusive and exhaustive, meaning that categories should be independent of one another and contain options for all observations. When categories lack independence and a complete set of response options (or content codes), observations may be classified into more than one category, or no categories at all. In either case, the analysis may not measure what it seeks to measure (i.e., the study may lack internal validity) and attempts to replicate the research may prove futile given an absence of reliability. The following section offers an overview of distributional assumptions and parameter estimation in categorical statistics.
A probability distribution links the quantitative outcome of a study with the probability the outcome will occur. In the social sciences, statistics texts focus heavily on outcomes obtained through models such as ordinary least squares regression. OLS regression assumes a normal probability distribution with a dependent variable measured at the interval level. It also assumes a random sample and equality of variances, and when analyses meet these assumptions, OLS models yield reliable and parsimonious results. When assumptions are not met, parameters may be misestimated, affecting substantive interpretations (see Aldrich and Nelson 1984).
In contrast to OLS regression, techniques for analyzing categorical response measures vary in the distributions they assume. Models covered in the current text generally assume one of three distributions: Binomial, multinomial, or Poisson (see Plackett 1981). The binomial distribution models the probability of observing a specific number of successes in a certain number of independent trials, and the multinomial distribution models the probability of observing a specific number of successes in each of several categories in a certain number of trials. The Poisson distribution models the probability of observing a specific number of successes in a fixed time period (see also Agresti 2007, 4–16).
In addition to differences in distributional assumptions, categorical procedures rely on a different type of parameter estimation. While OLS regression models contain parameter estimates based on least squares (LSE), techniques addressed in this book draw on maximum likelihood estimation (MLE). Addressing parameterization, Nunnally and Bernstein (1994, 148) defined an estimator as “a decision rule that results in a particular value or estimate that is a function of the data.” Developed by R. A. Fisher (for historical discussion, see Aldrich 1997), MLE selects parameter estimates that have the greatest likelihood of resulting in the observed sample (Myung 2003). Nunnally and Bernstein noted that while LSE shows little bias in small samples, MLE tends to show greater efficiency and consistency with large datasets.5
Because MLE is central to procedures addressed in this text, it is important for readers to gain a sense of how maximum likelihood estimates parameters. One approach for demonstrating MLE is to use the binomial formula to first compute the probability that, in this case, a certain number of males (y) will appear in a sample (n), with population parameter π indicating the probability of being male. With factorials denoted by!, the binomial formula is expressed as:
To find the probability that three men will appear in a sample of 10 with the probability of male being .50, one would construct the following equation:
One would then perform the necessary calculations to arrive at the probability of three men appearing in a sample of 10 individuals, given the .50 probability of being male:
In this case, the probability that three men will appear in the sample of 10, given the π value of .50, is 0.117. The formula for the probability distribution and the values of the parameters π and n were known, and the task was to find the probability of observing outcome y. But in the practice of quantitative research, parameter values are not known and must be estimated from sample data. A researcher therefore must substitute observed data into the formula for the probability function and then examine different values of π. Using data from the example above, the formula is thus:
After examining the probability for multiple values of π, one arrives at a value for the maximum likelihood estimate; that is, the value of π at which the likelihood of the observed data is highest. Given observed data indicating three successes in 10 independent trials, .3 is the most probable and thus the best estimate for π. Maximum likelihood is used in parameterization processes for advanced categorical statistics and will be referenced throughout the text. The preceding example was designed to familiarize readers with the process, as social scientists often have greater familiarity with least squares estimation (see, for additional discussion, Myung 2003).
To facilitate measurement, each chapter in this text contains a section addressing SPSS® techniques for categorical data analysis. Purchased by IBM® in 2009, SPSS is a popular software package in communication and other social science disciplines, and the current text uses SPSS for Windows version 19. Scholars have also used SAS, Stata, and R, each of which functions very well in studies requiring multivariate statistics (see Stokes, Davis, and Koch 2012, Long and Freese 2014). SAS and Stata, in particular, are more powerful than SPSS; however, given the disciplinary prevalence of SPSS, the text focuses on that software. To conserve space in the text, SPSS output is condensed in certain places, with amenable font.
This chapter began with examples of categorical variables, noting that statisticians such as Karl Pearson and George Udny Yule differed in their assumptions about measurement. The chapter included an example of cross‐classified frequency data from the election research of Berelson, Lazarsfeld, and McPhee (1954) and introduced the types of statistical procedures covered in subsequent chapters. Unlike OLS regression, which assumes a normal probability distribution, procedures covered in this text assume binomial, multinomial, and Poisson distributions. Additionally, instead of least squares estimation, categorical techniques use maximum likelihood in parameterization processes.
Define (or explain) each of the following terms as applicable to categorical statistics.
Binomial distribution
Dependent variable
Dichotomous measure
Discrete interval data
Exhaustiveness
Grouped interval data
Independent variable
Maximum likelihood estimation
Multinomial distribution
Mutually exclusive
Nominal data
Null hypothesis
Ordinal data
Poisson distribution
Polytomous measure
Statistical significance
Type I error
Type II error
Classify each measure below as
nominal
,
ordinal
,
discrete interval
, or
grouped interval
, briefly justifying each classification.
Position in news organization (advertising representative, editor, publisher, reporter).
Number of “tweets” counted in 60‐minute period.
Televised anti‐drug spots seen in past week (0, 1–2, 3–5, 6–9, 10–19, 20+).
Attitude toward establishment of federal shield law for journalists (strongly approve, approve, undecided, disapprove, strongly disapprove).
Political ideology (liberal, moderate, conservative).
Political party identification (democrat, republican, independent, other).
Number of violent acts in episode of police drama.
Attention to national television news (no attention, some attention, quite a bit of attention, a great deal of attention).
Empathy for speaker (none, a little, some, a great deal).
Length of public address (less than 60 minutes, 60–74 minutes, 75–89 minutes, 90–104 minutes, 105–119 minutes, 120 or more minutes).
Use the binomial formula to find the probability that 4 women will appear in a sample of 10 with the probability of female being .50. Then, calculate a maximum likelihood estimate. Be sure to show your work, indicating the steps taken to perform the calculations.
Agresti, Alan. 1990.
Categorical Data Analysis
. New York: John Wiley & Sons.
Agresti, Alan. 2007.
An Introduction to Categorical Data Analysis
, 2nd ed. New York: John Wiley & Sons.
Aldrich, John. 1997. “R. A. Fisher and the Making of Maximum Likelihood.”
Statistical Science
, 12(3): 162–176.
Aldrich, John H., and Forrest D. Nelson. 1984.
Linear Probability, Logit, and Probit Models
. Newbury Park, CA: Sage.
Alwin, Duane F., and Richard T. Campbell. 1987. “Continuity and Change in Methods of Survey Data Analysis.”
Public Opinion Quarterly
, 51: S139–S155.
Azen, Razia, and Cindy M. Walker. 2011.
Categorical Data Analysis for the Behavioral and Social Sciences
. New York: Routledge.
Berelson, Bernard R., Paul F. Lazarsfeld, and William N. McPhee. 1954.
Voting: A Study of Opinion Formation in a Presidential Campaign
. Chicago: University of Chicago Press.
Berkson, Joseph. 1944. “Application of the Logistic Function to Bio‐Assay.”
Journal of the American Statistical Association
, 39: 357–365. DOI:10.1080/01621459.1944.10500699.
Bliss, C. I. 1935. “The Calculation of the Dosage‐Mortality Curve.”
Annals of Applied Biology
, 22: 134–167. DOI:10.1111/j.1744‐7348.1935.tb07713.x.
Cox, D. R. 1958. “The Regression Analysis of Binary Sequences.”
Journal of the Royal Statistical Society B
, 34: 215–242.
Fienberg, Stephen E. 2000. “Contingency Tables and Log‐Linear Models: Basic Results and New Developments.”
Journal of the American Statistical Association
, 95: 643–647. DOI:10.1080/01621459.2000.10474242.
Fienberg, Stephen E. 2007.
The Analysis of Cross‐Classified Categorical Data
, 2nd ed. New York: Springer.
Fienberg, Stephen E., and Alessandro Rinaldo. 2007. “Three Centuries of Categorical Data Analysis: Log‐linear Models and Maximum Likelihood Estimation.”
Journal of Statistical Planning and Inference
, 137: 3430–3445. DOI:10.1016/j.jspi.2007.03.022.
Fisher, R. A. 1922a. “On the Interpretation of χ
2
from Contingency Tables, and the Calculation of
p
.”
Journal of the Royal Statistical Society
, 85(1): 87–94.
Fisher, R. A. 1922b. “On the Mathematical Foundations of Theoretical Statistics.”
Philosophical Transactions of the Royal Society of London Series A
, 222: 309–368.
Goodman, Leo A. 1978.
Analyzing Qualitative/Categorical Data
. Cambridge, MA: Abt Books.
Koopmans, Lambert H. 1987.
Introduction to Contemporary Statistical Methods
, 2nd ed. Boston: Duxbury.
Lazarsfeld, Paul F., Bernard Berelson, and Hazel Gaudet. 1948.
The People’s Choice: How the Voter Makes Up His Mind in a Presidential Election
, 2nd ed. New York: Columbia University Press.
Long, J. Scott, and Jeremy Freese. 2014.
Regression Models for Categorical Dependent Variables Using Stata
, 3rd ed. College Station, TX: Stata Press.
Matsunaga, Masaki. 2007. “Familywise Error in Multiple Comparisons: Disentangling a Knot Through a Critique of O’Keefe’s Arguments Against Alpha Adjustment.”
Communication Methods and Measures
, 1: 243–265. DOI:10.1080/19312450701641409.
McCullagh, Peter. 1980. “Regression Models for Ordinal Data.”
Journal of the Royal Statistical Society B
