Categorical Statistics for Communication Research - Bryan E. Denham - E-Book

Categorical Statistics for Communication Research E-Book

Bryan E. Denham

0,0
35,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Categorical Statistics for CommunicationResearch presents scholars with a discipline-specific guide to categorical data analysis. The text blends necessary background information and formulas for statistical procedures with data analyses illustrating techniques such as log- linear modeling and logistic regression analysis. * Provides techniques for analyzing categorical data from a communication studies perspective * Provides an accessible presentation of techniques for analyzing categorical data for communication scholars and other social scientists working at the advanced undergraduate and graduate teaching levels * Illustrated with examples from different types of communication research such as health, political and sports communication and entertainment * Includes exercises at the end of each chapter and a companion website containing exercise answers and chapter-by-chapter PowerPoint slides

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 460

Veröffentlichungsjahr: 2016

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

Title Page

Preface

References

Acknowledgments

About the Companion Website

1 Introduction to Categorical Statistics

Historical Overview

Probability Distributions and Parameter Estimation

Example of Maximum Likelihood Estimation

A Note on Statistical Software

Chapter Summary

Chapter Exercises

References

2 Univariate Goodness of Fit and Contingency Tables in Two Dimensions

Chi‐Square Test for Goodness of Fit

Chi‐Square Test of Independence in Contingency Tables

Likelihood Ratio Statistic

Exact Tests for Small Samples

McNemar’s Test for Correlated Samples

Measures of Association

Odds Ratio

Relative Risk

Phi Coefficient

Cramér’s V

Pearson’s Contingency Coefficient

Kendall’s Tau

Goodman and Kruskal’s Gamma

Somers’

d

Points of Concern in Bivariate Analyses

SPSS Analyses

Testing Goodness of Fit in SPSS

Testing Independence in SPSS

A Note on Style

Chapter Summary

Chapter Exercises

References

3 Contingency Tables in Three Dimensions

Moving from Two to Three Dimensions

Cochran‐Mantel‐Haenszel Test

Breslow‐Day Test

An Example in Public Health

An Example in Political Communication

Chapter Summary

Chapter Exercises

References

4 Log‐linear Analysis

Development of Log‐linear Models

Examples of Published Research

Log‐linear Analysis: Fundamentals

Two‐way Tables

Three‐way Models

Goodness of Fit and Model Selection

Descriptive Statistics and Residuals for the Fitted Model

Parameter Estimation

Ordinal Log‐linear Analysis

Three Ordinal Measures

More Complex Models

Visual Displays

Chapter Summary

Chapter Exercises

References

5 Logit Log‐linear Analysis

Examples of Published Research

Logit Log‐linear Analysis: Fundamental Components

Logit Model with One Response Measure

Logit Model with Two Response Measures

SPSS Example

Correspondence Analysis

Chapter Summary

Chapter Exercises

References

6 Binary Logistic Regression

Examples of Published Research

Binary Logistic Regression: Fundamentals

Simple Logistic Regression Analysis

Multiple Logistic Regression Analysis

Interactions

Model Assessment

Additional Statistics

Diagnostic Considerations

Binary Logistic Regression in SPSS

Chapter Summary

Chapter Exercises

References

7 Multinomial Logistic Regression

Examples of Published Research

Multinomial Logistic Regression: Fundamentals

Simple Multinomial Logistic Regression Analysis

Multiple Multinomial Logistic Regression Analysis

Conditional Logit Modeling

Multinomial Logistic Regression in SPSS

Chapter Summary

Chapter Exercises

References

8 Ordinal Logistic Regression

Examples of Published Research

Ordinal Logistic Regression: Fundamentals

Simple Ordinal Logistic Regression Analysis

Multiple Ordinal Logistic Regression Analysis

Interactions

Ordinal Logistic Regression in SPSS

Chapter Summary

Chapter Exercises

References

9 Probit Analysis

Examples of Published Research

Probit Analysis: Fundamentals

Binary Probit Analysis

Ordinal Probit Analysis

Multinomial Probit Analysis

Interactions

Chapter Summary

Chapter Exercises

References

10 Poisson and Negative Binomial Regression

Examples of Published Research

Poisson Regression: Fundamentals

Negative Binomial Regression: Fundamentals

Additional Techniques

SPSS Analyses

Chapter Summary

Chapter Exercises

References

11 Interrater Agreement Measures for Nominal and Ordinal Data

Analysis of Nominal Data with Two Raters

Analysis of Nominal Data with Multiple Raters

Analysis of Ordinal Data with Two Raters

Analysis of Ordinal Data with Multiple Raters

Kappa Coefficient in SPSS

Intraclass Correlation Coefficients in SPSS

Chapter Summary

Chapter Exercises

References

12 Concluding Communication

References

Appendix A: Chi‐Square Table

Appendix B: SPSS Code for Selected Procedures

Index

End User License Agreement

List of Tables

Chapter 01

Table 1.1 Example of cross‐classifications containing nominal and ordinal measures

Chapter 02

Table 2.1 Data for Title IX goodness‐of‐fit test

Table 2.2 Cross‐tabulation of time period by drug‐use mentions in horse‐racing reports

Table 2.3 Cross‐tabulation of time period by drug‐use mentions in horse‐racing reports on NPR

Table 2.4 Cross‐tabulation of paired attitudes

Table 2.5 Cross‐tabulation of communication with parents about drug dangers and experimenting with marijuana

Table 2.6 Cross‐tabulation of race and contacting a public official during previous year

Table 2.7 Cross‐tabulation of race and perceptions of party most capable of managing economy

Table 2.8 Cross‐tabulation of television and newspaper exposure during 2008 election campaigns

Table 2.9a Calculations of concordant and discordant pairs

Table 2.9b Calculation of corrections for ties

Table 2.10 Display of SPSS cross‐tabulation and chi‐square statistics

Chapter 03

Table 3.1 Cross‐tabulations of time period by horse injury or death mentions in two newspapers

Table 3.2 Cross‐tabulations of sex and marijuana experimentation with race as control measure

Table 3.3 Cross‐tabulations of sex and political discussion with race as control measure

Table 3.4 Odds ratios reported in Risk function in SPSS

Table 3.5 Select results of Breslow‐Day and Cochran‐Mantel‐Haenszel tests in SPSS

Chapter 04

Table 4.1 General log‐linear analyses of sex, race, and political leaning

Table 4.2 Descriptive statistics for log‐linear model containing sex, race, and political leaning

Table 4.3 Parameter estimates for log‐linear model containing sex, race, political leaning, and interaction of race and political leaning

Table 4.4 Cross‐tabulation of race and political leaning

Table 4.5 Parameter estimates for log‐linear model containing newspaper use and alcohol risk

Table 4.6 Parameter estimates for log‐linear model containing newspaper use, alcohol risk, and ordinal association parameter

Table 4.7 General log‐linear analyses of newspaper use, steroid risk perceptions, and participation in school‐sponsored athletics

Table 4.8 Parameter estimates for ordinal log‐linear model containing newspaper use, steroid risk, sports participation, and two interactions

Table 4.9 General log‐linear analyses of sex, race, personal optimism, and national optimism with frequency of political discussion as covariate

Table 4.10 SPSS goodness‐of‐fit display for log‐linear model containing sex, race, personal optimism, national optimism, and political discussion covariate

Table 4.11a Descriptive statistics for log‐linear model containing sex (males), race, personal optimism, national optimism, and political discussion covariate

Table 4.11b Descriptive statistics for log‐linear model containing sex (females), race, personal optimism, national optimism, and political discussion covariate

Table 4.12 SPSS parameter estimates for log‐linear model containing sex, race, personal optimism, national optimism, and political discussion covariate

Chapter 05

Table 5.1 Logit log‐linear models including sex, race, personal optimism, and frequency of political discussion as explanatory measures of national optimism

Table 5.2a Descriptive statistics for logit log‐linear models including sex (males), race, personal optimism, and frequency of political discussion as explanatory measures of national optimism

Table 5.2b Descriptive statistics for logit log‐linear models including sex (females), race, personal optimism, and frequency of political discussion as explanatory measures of national optimism

Table 5.3 Constant estimates for logit log‐linear model including sex, race, personal optimism, and frequency of political discussion as explanatory measures of national optimism

Table 5.4 Parameter estimates for logit log‐linear model including sex, race, personal optimism, and frequency of political discussion as explanatory measures of national optimism

Table 5.5 Logit log‐linear models including sex, race, school suspension, and exposure to antidrug advertising as explanatory measures of alcohol and marijuana disapproval

Table 5.6 Constant estimates for logit log‐linear model containing sex, race, school suspension, and exposure to antidrug advertising as explanatory measures of alcohol and marijuana disapproval

Table 5.7a Parameter estimates for logit log‐linear model including sex, race, school suspension, and exposure to antidrug advertising as explanatory measures of alcohol and marijuana disapproval

Table 5.7b Parameter estimates for logit log‐linear model including sex, race, school suspension, and exposure to antidrug advertising as explanatory measures of alcohol and marijuana disapproval

Table 5.8 SPSS goodness‐of‐fit display for logit log‐linear model containing sex, parental communication, limited television viewing, and teacher encouragement as predictors of attitudes toward peer alcohol consumption

Table 5.9 Cell Counts and Residuals for logit log‐linear model containing sex, parental communication, limited television viewing, and teacher encouragement as predictors of attitudes toward peer alcohol consumption

Table 5.10 Constant estimates for logit log‐linear model containing sex, parental communication, limited television viewing, and teacher encouragement as predictors of attitudes toward peer alcohol consumption

Table 5.11 Parameter estimates for logit log‐linear model containing sex, parental communication, limited television viewing, and teacher encouragement as predictors of attitudes toward peer alcohol consumption

Chapter 06

Table 6.1 Cross‐tabulation of time period by drug‐use mentions in horse‐racing reports

Table 6.2 Logistic regression model testing time period as determinant of drug‐use mentions in

Albuquerque Journal

Table 6.3 Logistic regression model testing sex, race, and age as determinants of economic attitudes

Table 6.4 Logistic regression model testing sex, race, and age, as well as political party identification, as determinants of economic attitudes

Table 6.5 Logistic regression model testing sex, race, and age, as well as political party identification and exposure to radio news, as determinants of economic attitudes

Table 6.6 Log‐likelihood estimates for three binary logistic regression models

Table 6.7 SPSS output for binary logistic regression model containing categorical predictors

Table 6.8 SPSS output for binary logistic regression model containing categorical and continuous predictors

Chapter 07

Table 7.1 Cross‐tabulation of race by political party affiliation

Table 7.2 Simple multinomial logistic regression model testing race as a determinant of political party affiliation

Table 7.3 Multiple multinomial logistic regression model testing sex, race, military service, and newspaper exposure as determinants of political party affiliation

Table 7.4 SPSS output for multinomial logistic regression model containing categorical predictors

Table 7.5 SPSS output for multinomial logistic regression model containing categorical predictors

Chapter 08

Table 8.1 Cross‐tabulation of sex by marijuana risk perceptions

Table 8.2 Observed data cross‐classification of sex by four levels of risk associated with regular marijuana use: frequency (

f

  ), proportion (

p

), cumulative proportion (

cp

), cumulative odds

a

(

co

), and Odds Ratios (

OR

)

Table 8.3 Ordinal logistic regression model testing sex as a predictor of marijuana risk perceptions

Table 8.4 Multiple ordinal logistic regression model testing sex, age, and teacher communication about drugs as determinants of marijuana risk perceptions

Table 8.5 Multiple multinomial logistic regression model testing sex, age, and teacher communication about drugs as determinants of marijuana risk perceptions

Table 8.6 SPSS output for ordinal logistic regression model containing categorical predictors

Table 8.7a SPSS frequencies for ordinal logistic regression model containing categorical predictors

Table 8.7b SPSS frequencies for ordinal logistic regression model containing categorical predictors

Chapter 09

Table 9.1 SPSS output for binary probit model in Generalized Linear Models procedure

Table 9.2 SPSS output for binary probit model in PLUM procedure

Table 9.3 SPSS output for ordinal probit regression model

Table 9.4a SPSS frequency output for ordinal probit regression model

Table 9.4b SPSS frequency output for ordinal probit regression model

Chapter 10

Table 10.1 SPSS output for Explore analysis of television news exposure

Table 10.2 SPSS output for Poisson regression model in Generalized Linear Models procedure

Table 10.3 SPSS output for negative binomial model in Generalized Linear Models procedure

Chapter 11

Table 11.1 Cell frequencies for interrater reliability calculations

Table 11.2 Hypothetical data for Fleiss kappa calculation

Table 11.3 Hypothetical data for weighted kappa calculation

Table 11.4 Example of quadratic weights applied to cells

Table 11.5 Data for calculation of Kendall’s

W

Table 11.6 SPSS output for kappa coefficient

Table 11.7 SPSS output for intraclass correlation coefficient

List of Illustrations

Chapter 02

Figure 2.1 Display of SPSS Goodness‐of‐Fit windows.

Figure 2.2 Display of SPSS Goodness‐of‐Fit results.

Figure 2.3 Display of SPSS Crosstabs and Cells windows.

Figure 2.4 Display of statistics available in SPSS Crosstabs procedure.

Chapter 03

Figure 3.1 SPSS screenshots for Breslow‐Day and Cochran‐Mantel‐Haenszel tests.

Chapter 04

Figure 4.1 SPSS screenshot of general log‐linear options.

Figure 4.2 SPSS screenshot of log‐linear model construction

Chapter 05

Figure 5.1 Screenshot of SPSS Logit Loglinear Analysis.

Figure 5.2 SPSS map of eleven ordinal variables.

Chapter 06

Figure 6.1 SPSS windows for binary logistic regression and variable definition.

Figure 6.2 SPSS windows for binary logistic regression and analysis options.

Chapter 07

Figure 7.1 SPSS screenshots for variables to be included in multinomial logistic regression model.

Figure 7.2 SPSS screenshots for output options in multinomial logistic regression model.

Chapter 08

Figure 8.1 SPSS screenshots for output options in ordinal logistic regression analysis (PLUM).

Figure 8.2 SPSS screenshots for location options in ordinal logistic regression analysis (PLUM).

Chapter 09

Figure 9.1 SPSS screenshot for Generalized Linear Models.

Figure 9.2 SPSS screenshot for Model design in Generalized Linear Models.

Chapter 11

Figure 11.1 Screenshots for SPSS kappa analysis.

Figure 11.2 Screenshots for SPSS intraclass correlation coefficient.

Guide

Cover

Table of Contents

Begin Reading

Pages

iii

iv

v

xiii

xiv

xv

xvi

xvii

xviii

xix

xx

1

2

3

4

5

6

7

8

9

11

10

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

134

136

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

211

212

214

215

213

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

253

254

252

255

256

257

258

259

262

263

264

265

266

267

268

269

270

Categorical Statistics for Communication Research

 

Bryan E. Denham

 

 

 

 

 

 

 

 

 

 

This edition first published 2017© 2017 John Wiley & Sons, Inc.

Registered OfficeJohn Wiley & Sons, Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK

Editorial Offices350 Main Street, Malden, MA 02148‐5020, USA9600 Garsington Road, Oxford, OX4 2DQ, UKThe Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK

For details of our global editorial offices, for customer services, and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com/wiley‐blackwell.

The right of Bryan E. Denham to be identified as the author of this work has been asserted in accordance with the UK Copyright, Designs and Patents Act 1988.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.

Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. It is sold on the understanding that the publisher is not engaged in rendering professional services and neither the publisher nor the author shall be liable for damages arising herefrom. If professional advice or other expert assistance is required, the services of a competent professional should be sought.

Library of Congress Cataloging‐in‐Publication Data

Names: Denham, Bryan E., 1967– author.Title: Categorical statistics for communication research / Bryan E. Denham.Description: Chichester, UK ; Malden, MA : John Wiley & Sons, 2016. | Includes index.Identifiers: LCCN 2016019969 (print) | LCCN 2016024992 (ebook) | ISBN 9781118927106 (cloth : alk. paper) | ISBN 9781118927090 (pbk.) | ISBN 9781118927083 (pdf) | ISBN 9781118927076 (epub)Subjects: LCSH: Communication–Research–Statistical methods. | Statistics.Classification: LCC P93.7 .D46 2016 (print) | LCC P93.7 (ebook) | DDC 302.23/021–dc23LC record available at https://lccn.loc.gov/2016019969

A catalogue record for this book is available from the British Library.

 

 

 

 

 

 

For all who aspire to theoretically informedand methodologically rigorous quantitative research.

Preface

In June 1946, recognizing an impasse among scientists debating measurement strategies in psychology, S. S. Stevens observed that measurement – the assignment of numerals to objects and events according to rules – depended on the scales to which data were assigned. Nominal scales involved the use of numerals as qualitative labels only, “and quite naturally,” Stevens (1946, 679) wrote, “there are many who will urge that it is absurd to attribute to this process of assigning numerals the dignity implied by the term measurement.” Indeed, simple frequency counts offered limited information, and because advanced analytic techniques for nominal data had not been developed, scholars typically examined relationships two variables at a time, in some cases controlling the level of a third measure.

At the next level of measurement, the ordinal level, observations appeared in a ranked sequence. Stevens (1946) cited hardness among minerals as an example, emphasizing that while order did exist, one could not assume equal intervals between observations; the interval between topaz and corundum, for instance, might not equal the interval between corundum and diamond. “In the strictest propriety,” Stevens (1946, 679) cautioned, “the ordinary statistics involving means and standard deviations ought not to be used with these scales, for these statistics imply a knowledge of something more than the relative rank‐order of the data.” In other words, summing a set of scores and dividing by the number of observations could yield a distorted average; the median, or exact middle score, served as a more appropriate measure. Nevertheless, like prominent statisticians who would follow, Stevens did not advocate the wholesale elimination of mean scores at the ordinal level, opting only to state that inaccuracies stood to increase as differences among intervals did the same.

For Stevens, data became “quantitative” at the interval level of measurement. Here, means and standard deviations could be computed without qualification, based on assumptions of equal intervals among observations. Centigrade and Fahrenheit temperature scales served as examples of interval measures, to be followed by a fourth and final level of measurement, the ratio scale, which contained a point of absolute zero in addition to equal intervals. Periods of time, Stevens wrote, could be measured on a ratio scale, as one could observe a period that was twice as long as another. In contrast, it made little sense to assert that a temperature of 70 degrees Fahrenheit was twice 35‐degree weather.

In the years since Stevens (1946) described the four levels of measurement, statisticians have generally referred to data measured at the nominal and ordinal levels as categorical while referring to data measured at the interval and ratio levels as continuous; the current text focuses on the former. While scholars in social‐science fields such as economics, political science, psychology, and sociology have written monographs and longer books addressing the analysis of nominal and ordinal data, communication scholars have lacked a text on which to draw in conducting studies and teaching quantitative research methods. Designed for graduate students in communication as well as faculty members and research professionals in the public and private sectors, Categorical Statistics for Communication Research seeks to fill a disciplinary void by presenting communication scholars with a discipline‐specific guide to categorical data analysis. In that sense the book seeks to complement statistics texts by Hayes (2005), Reinard (2006), and Weber and Fuller (2013). Their texts contain excellent instruction on techniques such as the analysis of variance (ANOVA) and ordinary least squares (OLS) regression, but the books do not address advanced approaches for analyzing categorical data. In covering advanced techniques in categorical statistics, the present text assumes the reader will have completed an undergraduate course addressing the fundamentals of quantitative research methods. Such a course may have followed one of the texts mentioned above, or perhaps one from Babbie (2015), Keyton (2014), or Wimmer and Dominick (2014).

At the graduate level, communication seminars on quantitative methods tend to focus on techniques that assume interval‐level response variables. Following discussions of descriptive statistics and measures of central tendency, instruction often focuses on the t‐test and one‐way analysis of variance before moving to correlation tests, factorial ANOVA and ordinary least squares regression. Advanced topics include techniques such as structural equations and hierarchical linear modeling (see Hayes, Slater, and Snyder 2008). In contrast, instruction on categorical statistics tends to begin and end with cross‐tabulation and chi‐square analysis; techniques for the simultaneous analysis of multiple categorical variables often receive little, if any, attention. In addressing such techniques, the current text aspires to the following objectives:

To provide an accessible guide to the use of categorical statistics, blending necessary background information and formulas for statistical procedures with data analyses illustrating the respective techniques;

To include examples from multiple areas of the communication discipline;

To demonstrate how techniques discussed in the book can be applied to data gathered through surveys, content analyses, and other methods;

To offer useful instructions for categorical data analyses in IBM SPSS

®

;

To demonstrate how procedural assumptions – and problems with meeting those assumptions – can offer substantive insight into communication processes;

To address points of methodological debate in an even‐handed manner, identifying approaches within and between areas of study;

To include a significant number of references for readers seeking additional background information about the techniques addressed.

To meet these objectives, the text begins with an introduction to categorical data analysis, reviewing statistical terminology and the assumptions statisticians have made in developing bivariate and multivariate tests. As the chapter explains, where techniques such as ANOVA and OLS regression assume a normal probability distribution, modeling procedures covered in the current text assume Poisson, binomial, and multinomial distributions, making the techniques comparably robust to non‐normal data. Additionally, modeling techniques covered in the text use maximum likelihood estimation (MLE), as opposed to least squares (LSE), in parameterization processes. Because MLE tends to be less biased with large samples (Nunnally and Bernstein 1994), procedures addressed in the book can prove valuable for studies that draw on large public datasets.

Chapter 2 addresses univariate goodness of fit and bivariate tests of independence and association. The chapter focuses on the use of chi‐square to assess proportions in the categories of a single variable and independence in contingency tables containing two measures. In doing so, the chapter includes examples from recent content analyses and survey research initiatives, also reviewing measures of association and the likelihood ratio statistic. Regarding terminology, readers may recognize chi‐square analysis as a popular nonparametric, “distribution‐free” technique for comparing observed and expected frequencies in cross‐tabulations (see Conover 1999, Siegel 1956). Absent a point of reference, scholars sometimes regard categorical statistics, in general, as nonparametric; however, as indicated in the previous paragraph, most of the categorical models in this text assume an established distribution. As Anderson and Philips (1981) pointed out, such models focus on parameter estimation and travel beyond mere significance testing. In short, categorical statistics should not be confused with distribution‐free, nonparametric procedures such as the Kruskal‐Wallis nonparametric ANOVA test or Spearman correlation analysis.

Chapter 3 moves from two‐dimensional contingency tables to analyses containing three categorical variables. Analyses of three‐dimensional tables involve testing relationships between two measures at a fixed level of a third. As the chapter demonstrates, the Breslow‐Day (B‐D) and Cochran‐Mantel‐Haenszel (C‐M‐H) tests facilitate comparisons of odds ratios and allow researchers to gather information about three‐way tables in an efficient manner. The B‐D and C‐M‐H tests have been applied primarily in studies of health communication, but scholars working in other areas also may find the procedures useful.

Chapter 4 focuses on log‐linear modeling, a technique used to examine contingency tables in more than two dimensions. Unlike logit log‐linear analysis, addressed in Chapter 5, general log‐linear models do not recognize differences between explanatory (independent) and response (dependent) measures; rather, analyses treat all variables as outcomes, modeling the natural logs of cell frequencies. Researchers who use log‐linear analysis generally seek to remove parameters from a saturated model, which contains all effects but 0 degrees of freedom, toward a more parsimonious representation of the observed data. Scholars who use logit log‐linear models also seek to identify parsimonious relationships, but they do so with a “categorical variable analog” (Knoke and Burke 1980, 25) to ordinary least squares regression. As Chapter 5 explains, logit log‐linear models estimate the log odds of a response measure as a function of explanatory variables, and the model also allows more than one dependent variable to be included in a given analysis. In that sense, the logit procedure bears some similarity to the multivariate analysis of variance, which allows more than one response measure to be included in a model.

Chapter 6 addresses binary logistic regression, a technique used in analyses containing a dichotomous dependent variable (e.g., whether or not an individual communicated with an elected representative in the previous 12 months). Logistic regression accommodates categorical and continuous explanatory measures and produces parameter estimates that can be exponentiated to form odds ratios. Chapter 7 covers multinomial logistic regression, which researchers use when a categorical dependent variable contains more than two levels. As an example, scholars of political communication might study predictors of national optimism, with a response measure indicating that survey respondents (a) appeared optimistic about the future of the nation, (b) appeared pessimistic, or (c) appeared neither optimistic nor pessimistic. Although the multinomial procedure treats a response measure as nominal, the technique often proves useful when ordinal logistic regression models, addressed in Chapter 8, do not meet assumptions. As its name implies, the ordinal model analyzes predictors of ordered response measures, which often appear in the form of Likert attitude statements. Researchers may ask study participants to indicate whether they Strongly Agree, Agree, are Undecided, Disagree, or Strongly Disagree that a social protest received fair treatment in the press. While many researchers would treat such a variable as quasi‐interval, Likert statements are technically ordinal measures.

Chapter 9 focuses on probit analysis, a technique similar to logistic regression. As the text explains, the binary probit model assumes an underlying, normally distributed, latent continuous measure. This assumption makes the probit model useful in studies involving issues such as gun control, attitudes toward which are more complex than simple for‐or‐against binaries. Probit analyses contain multinomial and ordinal approaches as well.

Chapter 10 addresses Poisson and negative binomial regression, two techniques used in analyses of count data (i.e., discrete units observed in a given period of time). A communication scholar might use the procedures in studying whether a number of “tweets” posted about a certain topic vary by region of the country and the gender of social media users. If the scholar coded tweets for a subjective measure, such as tone, he or she would need to measure interrater reliability, which the current text covers in Chapter 11. This chapter contains reliability formulas and examples for both nominal and ordinal content variables, explaining how reliability testing advances a study from personal belief to social science, facilitating replication in the process.

In discussing statistical procedures, the text draws on a content analysis published in Journalism & Mass Communication Quarterly (Denham 2014) as well as three datasets made available by the Inter‐university Consortium for Political and Social Research (ICPSR) at the University of Michigan. The datasets include the 2008 American National Election Study (The American National Election Studies 2008), the 2011 National Survey on Drug Use and Health (United States Department of Health and Human Services 2011), and the 2012 Monitoring the Future study of American youth (Johnston, Bachman, O’Malley, and Schulenberg 2012). Examples illustrate procedures through topics in political and health communication as well as other areas in the communication discipline.

Regardless of the topics communication scholars engage, quantitative research studies invariably contain nominal and ordinal variables. Categorical Statistics for Communication Research seeks to enhance the measurement of these variables in statistical systems, contributing both theoretically and methodologically to disciplinary research.

Bryan DenhamJuly 2016Clemson, SC

References

Anderson, J. A., and P. R. Philips. 1981. “Regression, Discrimination and Measurement Models for Ordered Categorical Variables.”

Applied Statistics

, 30: 22–31.

Babbie, Earl. 2015.

The Practice of Social Research

, 14th ed. Boston, MA: Cengage Learning.

Conover, W. J. 1999.

Practical Nonparametric Statistics

, 3rd ed. New York: John Wiley & Sons, Inc.

Denham, Bryan E. 2014. “Intermedia Attribute Agenda‐Setting in the New York Times: The Case of Animal Abuse in U.S. Horse Racing.”

Journalism & Mass Communication Quarterly

, 91: 17–37. DOI:10.1177/1077699013514415.

Hayes, Andrew F. 2005.

Statistical Methods for Communication Science

. Mahwah, NJ: Erlbaum.

Hayes, Andrew F., Michael D. Slater, and Leslie B. Snyder, eds. 2008.

The Sage Sourcebook of Advanced Data Analysis Methods for Communication Research

. Thousand Oaks, CA: Sage.

Johnston, Lloyd D., Jerald G. Bachman, Patrick M. O’Malley, and John Schulenberg. 2012. Monitoring the Future: A Continuing Study of American Youth. Funded by National Institute on Drug Abuse. Institute for Social Research, University of Michigan.

Keyton, Joann. 2014.

Communication Research: Asking Questions, Finding Answers

, 4th ed. New York: McGraw‐Hill.

Knoke, David, and Peter J. Burke. 1980.

Log‐linear Models

. Beverly Hills, CA: Sage.

Nunnally, Jum C., and Ira H. Bernstein. 1994.

Psychometric Theory

, 3rd ed. New York: McGraw‐Hill.

Reinard, John C. 2006.

Communication Research Statistics

. Thousand Oaks, CA: Sage.

Siegel, S. 1956.

Non‐parametric Statistics for the Behavioural Sciences

. New York: McGraw‐Hill.

Stevens, S. S. 1946. “On the Theory of Scales of Measurement.”

Science

, 103: 677–680.

The American National Election Studies. 2008. American National Election Study: ANES Pre‐ and Post‐Election Survey. ICPSR25383‐v2. Ann Arbor, MI: Inter‐university Consortium for Political and Social Research [distributor], 2012‐08‐30. DOI:10.3886/ICPSR25383.v2.

United States Department of Health and Human Services. 2011. Substance Abuse and Mental Health Services Administration. Center for Behavioral Health Statistics and Quality. National Survey on Drug Use and Health, 2011. ICPSR34481‐v2. Ann Arbor, MI: Inter‐university Consortium for Political and Social Research [distributor], 2013‐06‐20.

Weber, Rene, and Ryan Fuller. 2013.

Statistical Methods for Communication Researchers and Professionals

. Dubuque, IA: Kendall Hunt.

Wimmer, Roger D., and Joseph R. Dominick. 2014.

Mass Media Research: An Introduction

, 10th ed. Boston, MA: Cengage Learning.

Acknowledgments

Eight reviewers evaluated the proposal for this text, and I thank each of them for their comments and suggestions regarding structure and content. I also recognize three reviewers who examined the initial draft of Categorical Statistics for Communication Research; their feedback helped me to clarify and improve chapter contents, and I very much appreciate their attention to detail.

Elizabeth Swayze provided initial guidance on this project, and I thank her for the support and encouragement. I also thank executive editor Haze Humbert as well as Patrick Wright, who has overseen marketing efforts. I recognize Mary Hall, Julia Kirk, Aneetta Antony, Joanna Pyke, and Roshna Mohan, each of whom contributed to the production of the book. I feel fortunate to have secured a contract with Wiley Blackwell, for in addition to publishing the scholarly journals of the International Communication Association, the company has published leading texts in applied statistics.

I thank the Inter‐university Consortium for Political and Social Research (ICPSR) at the University of Michigan for allowing me to demonstrate statistical techniques with data gathered in survey projects such as the American National Election Studies, the National Survey on Drug Use and Health, and the Monitoring the Future study of American youth. I also thank IBM for granting me permission to include screenshots of SPSS for Windows procedures.

Clemson University granted me a sabbatical to pursue this project, and I appreciate that investment in my work. I also acknowledge Clemson Libraries for maintaining electronic subscriptions to key scholarly journals and for retaining classic texts in categorical statistics. I acknowledge my doctoral adviser, M. Mark Miller, PhD, who encouraged me to pursue a minor in applied statistics. In graduate seminars and in dissertation meetings, Dr Miller stressed the importance of theory in quantitative research, and I certainly share his sentiments regarding theoretically informed social science. I also thank Tony Rimmer, PhD, my MA adviser, for introducing me to quantitative research methods at the graduate level.

Finally, I thank students, colleagues, and family members for expressing an interest in the project, offering encouragement, and sending along news items and other materials addressing scientific studies and quantitative research methods.

About the Companion Website

This book is accompanied by a companion website:

www.wiley.com/go/denham/categorical_statistics

The website includes:

Data files for chapter exercises

Answers to chapter exercises

Chapter PowerPoint slides

1Introduction to Categorical Statistics

This text focuses principally on the analysis of nominal and ordinal data. Nominal measures contain unordered categories while ordinal variables contain categories in a sequence; both types of measures appear frequently in communication research. At the nominal level, news texts may or may not mention specific issue attributes, and during election years, individuals may or may not view a debate, campaign for a candidate, or vote in a primary. Individuals may be male or female, and they may or may not have served in the military. In addition to these dichotomous measures, unordered polytomous variables include items such as race, religion, and marital status, each of which contains more than two categories. At the ordinal level, attitude statements frequently include five response options: Strongly Agree, Agree, Undecided, Disagree, and Strongly Disagree. Estimations of risk may range from No Risk to Great Risk, and individuals responding to policy decisions may range from Strongly Approve to Strongly Disapprove in their reactions.

Statistician Alan Agresti (1990) mentioned two additional types of categorical data: discrete interval and grouped interval. Discrete interval measures often contain a limited number of values, and because they take the form of integers – and integers only – they are not treated as continuous quantitative measures, which can take on any real value. As an example of discrete interval data, a college dean might record the number of people who earn a graduate degree in communication each year, with recipients constituting discrete units. Regarding grouped interval data, researchers sometimes combine continuous interval measures into ordered brackets, as in the case of income, where asking a survey respondent for a specific figure might be considered both invasive and unnecessary. As a second example, while news reports about a given subject might average 731 words, a researcher might be interested in the number of articles that appear in ordered increments of 250 words.

Historical Overview

In covering techniques for analyzing both ordered and unordered categorical variables, the current text recognizes that statisticians have differed in their assumptions and approaches to categorical data analysis. As Powers and Xie (2000) explained, one school of thought considers categorical data part of an underlying continuous distribution, while a second perspective considers categorical data inherently categorical. In historical terms, Agresti (1990) explained that Karl Pearson (1900), who developed the chi‐square goodness‐of‐fit test, assumed continuous distributions underlying categorical variables, while one of Pearson’s contemporaries, George Udny Yule (1900), believed that certain types of variables were inherently categorical and did not require assumptions of underlying distributions. Fienberg (2007) observed merit in both perspectives, noting that Pearson and Yule, along with R. A. Fisher (1922a, 1922b), played significant roles in building a foundation for the development of more advanced analytic techniques (see, for additional history, Fienberg and Rinaldo 2007, Plackett 1983). Interestingly, several decades would pass before statisticians developed advanced procedures for categorical data analysis. Most of the modeling techniques covered in the current text emerged after 1960, whereas statisticians had developed multivariate tests for continuous data decades earlier.

Seminal research in communication (e.g., Lazarsfeld, Berelson, and Gaudet 1948) demonstrates how social scientists analyzed and displayed categorical data. Lacking advanced statistical procedures, researchers typically presented data in the form of frequency charts and cross‐tabulations. As an example, Table 1.1 contains data gathered in the 1948 election year and published in Voting: A Study of Opinion Formation in a Presidential Election (Berelson, Lazarsfeld, and McPhee 1954, 243). The table contains both nominal and ordinal frequency measures and offers descriptive information in a limited but effective manner. Recognizing a pattern between exposure to mass media and level of interest in the presidential election, the authors reported demographic and psychographic information about 814 individuals in Elmira, New York. In the table, numbers appearing in parentheses indicate cell frequencies while figures outside the parentheses indicate the percentage of individuals in each cell who were exposed to media at “High and High‐Middle” levels (N = 432). This approach allowed readers, if so inclined, to calculate the number of respondents in each cell who scored “Low and Low‐Middle” on exposure indices (N = 382), all the while inspecting results across three levels of campaign interest. The use of percentages for “High and High‐Middle” media users allowed the authors to show statistical patterns that raw cell frequencies would have obscured. Examining the table, one observes that individuals exposed the most to mass media and interested the most in the election belonged to more organizations, had higher levels of education, and appeared in higher socioeconomic classes.

Table 1.1 Example of cross‐classifications containing nominal and ordinal measures

Percentage with High or High‐Middle Exposure (on Index)

Level of Interest

Characteristics

Great Deal

Quite a Lot

Not Much at All

(a) Organization Membership:

Belongs to Two or More

82 (103)

68 (87)

39 (64)

Belongs to One

72 (71)

57 (74)

34 (68)

Belongs to None

62 (100)

47 (112)

24 (126)

(b) Education:

College

88 (58)

62 (37)

48 (25)

High School

71 (166)

60 (171)

30 (152)

Grammar School or Less

56 (48)

45 (62)

25 (81)

(c) Socioeconomic Status:

Higher

79 (167)

63 (120)

39 (105)

Lower

60 (108)

52 (153)

25 (154)

(d) Sex:

Men

72 (122)

60 (124)

38 (110)

Women

71 (153)

54 (149)

25 (149)

(e) Neuroticism:

Low

77 (112)

64 (106)

30 (100)

High

67 (149)

50 (147)

30 (138)

Note: Table appeared originally in Berelson, Lazarsfeld, and McPhee (1954), Voting: A Study of Opinion Formation in a Presidential Election. © 1954 by The University of Chicago. Reprinted with permission, University of Chicago Press.

Readers familiar with significance testing may notice that Table 1.1 does not contain chi‐square analyses, commonly used to determine whether significant differences exist between observed and expected cell frequencies. Lazarsfeld, a research methodologist, did not consider it appropriate to test bivariate relationships for statistical significance, reasoning that additional variables could alter – or eliminate – significant relationships.1 As indicated, when Lazarsfeld and his colleagues conducted their election research, multivariate techniques for categorical data had not been developed. For example, log‐linear modeling, which examines associations among multiple categorical variables simultaneously, did not exist as such; had the technique been available, Lazarsfeld and other researchers may have used it in analyzing frequency data. In fact, Alwin and Campbell (1987, S147) described log‐linear models as, “in many ways, the culmination of the classic Lazarsfeldian tradition. They relate to it directly, rather than obliquely. They focus on tables, the basic building blocks of survey analysis, and they provide precise tests of simple and complex versions of partialling and elaboration hypotheses.” Indeed, where Pearson and Yule worked with 2 x 2 contingency tables (i.e., cross‐tabulations in which both variables contained two categories), statisticians who developed log‐linear models (see Goodman 1978) established approaches for the simultaneous analysis of more than two variables, each of which may have contained more than two categories.

In addition to log‐linear modeling, the current text also addresses binary, multinomial, and ordinal logistic regression analyses. As with ordinary least squares (OLS) regression, logistic models examine the effects of one or more independent (explanatory, predictor) variables on a single dependent (response, outcome) measure.2 Like log‐linear models, logistic regression techniques belong to a special class of generalized linear models (GLMs), developed by Nelder and Wedderburn (1972). As explained in Chapter 4 of the current text, a GLM contains a systematic and a random component as well as a link function. Explanatory variables form the systematic component, while a dependent measure and the probability distribution assigned to it constitute the random component (Agresti 2007, 66–67; see also, McCullagh and Nelder 1989). Link functions connect the systematic and random components.

In the case of log‐linear and logistic regression techniques, the link function transforms a response measure, such that the dependent variable can be modeled as a linear function of explanatory measures. In OLS regression, a transformation is not necessary, as the procedure models the mean of a dependent variable directly, using an identity link. Log‐linear analysis, which models cell frequencies, uses a log link function, while logistic regression analysis, which models a response measure containing a value between 0 and 1 (e.g., a probability), uses the log of the odds. Statisticians who developed logistic regression models (e.g., Cox 1958, McCullagh 1980) built on the work of individuals such as Chester Bliss (1935), who popularized the probit model, and Joseph Berkson (1944), who applied the term logit to log odds.3

Because advanced modeling techniques for categorical data facilitate the simultaneous examination of multiple variables, they can help to lower the risk of Type I error, or a false rejection of the null hypothesis. When a researcher conducts multiple bivariate analyses using the same set of data, he or she increases the likelihood of identifying “significant” relationships that may be little more than chance occurrences. Yet, legitimate relationships can be rejected when analyses are too conservative; in such cases, Type II error – a failure to reject the null hypothesis when it should be rejected – can occur.4 As the current text observes, examining multiple variables simultaneously offers an appropriate balance for controlling the two types of error – provided statistical tests meet their assumptions.

Categorical statistics, in general, assume independence among observations, and when that assumption is violated, artificial inflation of a sample may occur, leaving statistical tests technically flawed. The chi‐square test statistic, in particular, is sensitive to sample size, and a lack of independence among observations will almost certainly compromise a study. As an example, while a researcher might content analyze 84 individual news reports, a statistician would not consider sentences or paragraphs within those reports independent units. Relatedly, categories within variables should be mutually exclusive and exhaustive, meaning that categories should be independent of one another and contain options for all observations. When categories lack independence and a complete set of response options (or content codes), observations may be classified into more than one category, or no categories at all. In either case, the analysis may not measure what it seeks to measure (i.e., the study may lack internal validity) and attempts to replicate the research may prove futile given an absence of reliability. The following section offers an overview of distributional assumptions and parameter estimation in categorical statistics.

Probability Distributions and Parameter Estimation

A probability distribution links the quantitative outcome of a study with the probability the outcome will occur. In the social sciences, statistics texts focus heavily on outcomes obtained through models such as ordinary least squares regression. OLS regression assumes a normal probability distribution with a dependent variable measured at the interval level. It also assumes a random sample and equality of variances, and when analyses meet these assumptions, OLS models yield reliable and parsimonious results. When assumptions are not met, parameters may be misestimated, affecting substantive interpretations (see Aldrich and Nelson 1984).

In contrast to OLS regression, techniques for analyzing categorical response measures vary in the distributions they assume. Models covered in the current text generally assume one of three distributions: Binomial, multinomial, or Poisson (see Plackett 1981). The binomial distribution models the probability of observing a specific number of successes in a certain number of independent trials, and the multinomial distribution models the probability of observing a specific number of successes in each of several categories in a certain number of trials. The Poisson distribution models the probability of observing a specific number of successes in a fixed time period (see also Agresti 2007, 4–16).

In addition to differences in distributional assumptions, categorical procedures rely on a different type of parameter estimation. While OLS regression models contain parameter estimates based on least squares (LSE), techniques addressed in this book draw on maximum likelihood estimation (MLE). Addressing parameterization, Nunnally and Bernstein (1994, 148) defined an estimator as “a decision rule that results in a particular value or estimate that is a function of the data.” Developed by R. A. Fisher (for historical discussion, see Aldrich 1997), MLE selects parameter estimates that have the greatest likelihood of resulting in the observed sample (Myung 2003). Nunnally and Bernstein noted that while LSE shows little bias in small samples, MLE tends to show greater efficiency and consistency with large datasets.5

Example of Maximum Likelihood Estimation

Because MLE is central to procedures addressed in this text, it is important for readers to gain a sense of how maximum likelihood estimates parameters. One approach for demonstrating MLE is to use the binomial formula to first compute the probability that, in this case, a certain number of males (y) will appear in a sample (n), with population parameter π indicating the probability of being male. With factorials denoted by!, the binomial formula is expressed as:

To find the probability that three men will appear in a sample of 10 with the probability of male being .50, one would construct the following equation:

One would then perform the necessary calculations to arrive at the probability of three men appearing in a sample of 10 individuals, given the .50 probability of being male:

In this case, the probability that three men will appear in the sample of 10, given the π value of .50, is 0.117. The formula for the probability distribution and the values of the parameters π and n were known, and the task was to find the probability of observing outcome y. But in the practice of quantitative research, parameter values are not known and must be estimated from sample data. A researcher therefore must substitute observed data into the formula for the probability function and then examine different values of π. Using data from the example above, the formula is thus:

After examining the probability for multiple values of π, one arrives at a value for the maximum likelihood estimate; that is, the value of π at which the likelihood of the observed data is highest. Given observed data indicating three successes in 10 independent trials, .3 is the most probable and thus the best estimate for π. Maximum likelihood is used in parameterization processes for advanced categorical statistics and will be referenced throughout the text. The preceding example was designed to familiarize readers with the process, as social scientists often have greater familiarity with least squares estimation (see, for additional discussion, Myung 2003).

A Note on Statistical Software

To facilitate measurement, each chapter in this text contains a section addressing SPSS® techniques for categorical data analysis. Purchased by IBM® in 2009, SPSS is a popular software package in communication and other social science disciplines, and the current text uses SPSS for Windows version 19. Scholars have also used SAS, Stata, and R, each of which functions very well in studies requiring multivariate statistics (see Stokes, Davis, and Koch 2012, Long and Freese 2014). SAS and Stata, in particular, are more powerful than SPSS; however, given the disciplinary prevalence of SPSS, the text focuses on that software. To conserve space in the text, SPSS output is condensed in certain places, with amenable font.

Chapter Summary

This chapter began with examples of categorical variables, noting that statisticians such as Karl Pearson and George Udny Yule differed in their assumptions about measurement. The chapter included an example of cross‐classified frequency data from the election research of Berelson, Lazarsfeld, and McPhee (1954) and introduced the types of statistical procedures covered in subsequent chapters. Unlike OLS regression, which assumes a normal probability distribution, procedures covered in this text assume binomial, multinomial, and Poisson distributions. Additionally, instead of least squares estimation, categorical techniques use maximum likelihood in parameterization processes.

Chapter Exercises

Define (or explain) each of the following terms as applicable to categorical statistics.

Binomial distribution

Dependent variable

Dichotomous measure

Discrete interval data

Exhaustiveness

Grouped interval data

Independent variable

Maximum likelihood estimation

Multinomial distribution

Mutually exclusive

Nominal data

Null hypothesis

Ordinal data

Poisson distribution

Polytomous measure

Statistical significance

Type I error

Type II error

Classify each measure below as

nominal

,

ordinal

,

discrete interval

, or

grouped interval

, briefly justifying each classification.

Position in news organization (advertising representative, editor, publisher, reporter).

Number of “tweets” counted in 60‐minute period.

Televised anti‐drug spots seen in past week (0, 1–2, 3–5, 6–9, 10–19, 20+).

Attitude toward establishment of federal shield law for journalists (strongly approve, approve, undecided, disapprove, strongly disapprove).

Political ideology (liberal, moderate, conservative).

Political party identification (democrat, republican, independent, other).

Number of violent acts in episode of police drama.

Attention to national television news (no attention, some attention, quite a bit of attention, a great deal of attention).

Empathy for speaker (none, a little, some, a great deal).

Length of public address (less than 60 minutes, 60–74 minutes, 75–89 minutes, 90–104 minutes, 105–119 minutes, 120 or more minutes).

Use the binomial formula to find the probability that 4 women will appear in a sample of 10 with the probability of female being .50. Then, calculate a maximum likelihood estimate. Be sure to show your work, indicating the steps taken to perform the calculations.

References

Agresti, Alan. 1990.

Categorical Data Analysis

. New York: John Wiley & Sons.

Agresti, Alan. 2007.

An Introduction to Categorical Data Analysis

, 2nd ed. New York: John Wiley & Sons.

Aldrich, John. 1997. “R. A. Fisher and the Making of Maximum Likelihood.”

Statistical Science

, 12(3): 162–176.

Aldrich, John H., and Forrest D. Nelson. 1984.

Linear Probability, Logit, and Probit Models

. Newbury Park, CA: Sage.

Alwin, Duane F., and Richard T. Campbell. 1987. “Continuity and Change in Methods of Survey Data Analysis.”

Public Opinion Quarterly

, 51: S139–S155.

Azen, Razia, and Cindy M. Walker. 2011.

Categorical Data Analysis for the Behavioral and Social Sciences

. New York: Routledge.

Berelson, Bernard R., Paul F. Lazarsfeld, and William N. McPhee. 1954.

Voting: A Study of Opinion Formation in a Presidential Campaign

. Chicago: University of Chicago Press.

Berkson, Joseph. 1944. “Application of the Logistic Function to Bio‐Assay.”

Journal of the American Statistical Association

, 39: 357–365. DOI:10.1080/01621459.1944.10500699.

Bliss, C. I. 1935. “The Calculation of the Dosage‐Mortality Curve.”

Annals of Applied Biology

, 22: 134–167. DOI:10.1111/j.1744‐7348.1935.tb07713.x.

Cox, D. R. 1958. “The Regression Analysis of Binary Sequences.”

Journal of the Royal Statistical Society B

, 34: 215–242.

Fienberg, Stephen E. 2000. “Contingency Tables and Log‐Linear Models: Basic Results and New Developments.”

Journal of the American Statistical Association

, 95: 643–647. DOI:10.1080/01621459.2000.10474242.

Fienberg, Stephen E. 2007.

The Analysis of Cross‐Classified Categorical Data

, 2nd ed. New York: Springer.

Fienberg, Stephen E., and Alessandro Rinaldo. 2007. “Three Centuries of Categorical Data Analysis: Log‐linear Models and Maximum Likelihood Estimation.”

Journal of Statistical Planning and Inference

, 137: 3430–3445. DOI:10.1016/j.jspi.2007.03.022.

Fisher, R. A. 1922a. “On the Interpretation of χ

2

from Contingency Tables, and the Calculation of

p

.”

Journal of the Royal Statistical Society

, 85(1): 87–94.

Fisher, R. A. 1922b. “On the Mathematical Foundations of Theoretical Statistics.”

Philosophical Transactions of the Royal Society of London Series A

, 222: 309–368.

Goodman, Leo A. 1978.

Analyzing Qualitative/Categorical Data

. Cambridge, MA: Abt Books.

Koopmans, Lambert H. 1987.

Introduction to Contemporary Statistical Methods

, 2nd ed. Boston: Duxbury.

Lazarsfeld, Paul F., Bernard Berelson, and Hazel Gaudet. 1948.

The People’s Choice: How the Voter Makes Up His Mind in a Presidential Election

, 2nd ed. New York: Columbia University Press.

Long, J. Scott, and Jeremy Freese. 2014.

Regression Models for Categorical Dependent Variables Using Stata

, 3rd ed. College Station, TX: Stata Press.

Matsunaga, Masaki. 2007. “Familywise Error in Multiple Comparisons: Disentangling a Knot Through a Critique of O’Keefe’s Arguments Against Alpha Adjustment.”

Communication Methods and Measures

, 1: 243–265. DOI:10.1080/19312450701641409.

McCullagh, Peter. 1980. “Regression Models for Ordinal Data.”

Journal of the Royal Statistical Society B