104,99 €
A clear and efficient balance between theory and application of statistical modeling techniques in the social and behavioral sciences
Written as a general and accessible introduction, Applied Univariate, Bivariate, and Multivariate Statistics provides an overview of statistical modeling techniques used in fields in the social and behavioral sciences. Blending statistical theory and methodology, the book surveys both the technical and theoretical aspects of good data analysis.
Featuring applied resources at various levels, the book includes statistical techniques such as t-tests and correlation as well as more advanced procedures such as MANOVA, factor analysis, and structural equation modeling. To promote a more in-depth interpretation of statistical techniques across the sciences, the book surveys some of the technical arguments underlying formulas and equations. Applied Univariate, Bivariate, and Multivariate Statistics also features
An ideal textbook for courses in statistics and methodology at the upper- undergraduate and graduate-levels in psychology, political science, biology, sociology, education, economics, communications, law, and survey research, Applied Univariate, Bivariate, and Multivariate Statistics is also a useful reference for practitioners and researchers in their field of application.
DANIEL J. DENIS, PhD, is Associate Professor of Quantitative Psychology at the University of Montana where he teaches courses in univariate and multivariate statistics. He has published a number of articles in peer-reviewed journals and has served as consultant to researchers and practitioners in a variety of fields.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Veröffentlichungsjahr: 2015
DANIEL J. DENIS
Copyright © 2016 John Wiley & Sons, Inc.
Published by John Wiley & Sons, Inc., Hoboken, New Jersey. All rights reservedPublished simultaneously in Canada
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per‐copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750‐8400, fax (978) 750‐4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748‐6011, fax (201) 748‐6008, or online at http://www.wiley.com/go/permission.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762‐2974, outside the United States at (317) 572‐3993 or fax (317) 572‐4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.
Library of Congress Cataloging‐in‐Publication Data:Denis, Daniel J., 1974‐ Applied univariate, bivariate, and multivariate statistics/Daniel J. Denis. pages cm Includes bibliographical references and index. ISBN 978‐1‐118‐63233‐8 (cloth) 1. Analysis of variance–Textbooks. 2. Multivariate analysis–Textbooks. I. Title. QA279.D4575 2016 519.5′ 3–dc23
2015016660
“Beyond essential truths is but a following.”
This book provides a general introduction and overview of univariate and multivariate statistical modeling techniques typically used in the social and behavioral sciences. Students reading this book will come from a variety of fields, including psychology, sociology, education, political science, etc., and possibly biology and economics. Spanning several statistical methods, the focus of the book is naturally one of breadth than of depth into any one particular technique. These are topics usually encountered by upper‐division undergraduate or beginning graduate students in the aforementioned fields.
A wide selection of applied statistics and methodology texts exist, from books that are relatively deep theoretically to texts that are essentially computer software manuals with a modest attempt to include at least some of the elements of statistical theory. All of these texts serve their intended purpose so long as the user has an appreciation of their strengths and limitations. Theoretical texts usually cover topics in sufficient depth, but often do not provide enough guidance on how to actually run these models using software. Software manuals, on the other hand, typically instruct one on how to obtain output, but too often assume the reader comes to these manuals already armed with a basic understanding of statistical theory and research methodology.
The author of this book did not intend to write a software manual, yet at the same time was not inclined to write something wholly abstract, theoretical, and of little pragmatic utility. The book you hold in your hands attempts a more or less “middle of the road” approach between these two extremes. Good data analysis only happens when one has at least some grounding in both the technical and philosophical aspects of quantitative science. For example, it is well known that the “machinery” of multivariate methodology is grounded primarily in relatively elementary linear and matrix algebra. However, the use of these procedures is not. The how to do something can always be dug up. The why to do something is where teaching and instruction are needed. Indeed, one can obtain a solution to an equation, but if one does not know how to use or interpret that solution, it is of little use in the applied sense.
Hence, a balance of sorts was attempted between theory and application. Whether the optimum balance has been achieved, will be, of course, left to the reader (or instructor) to ultimately decide. Undoubtedly, the theoretician will find the coverage somewhat trivial, while the application‐focused researcher will yearn for more illustrations and data examples. It is hoped, however, that the student new to these methods will find the mix to his or her liking, and will find the book relevant as a relatively gentle introduction to these techniques.
As merely a survey and overview of statistical methodologies, the book is void of proofs or other technical justification as one would have in a more theoretical book. This does not imply, however, that the book is one of recipes. Attention was given to explaining how formulas work and what they mean, as I see this as the first step to facilitating an understanding of the more technical arguments required for proofs and the like. The emphasis is on communicating what the equations and formulas are actually telling you, instead of focusing on how they are rigorously and timelessly justified. Readers interested in a more advanced and theoretical treatment should consult any of the excellent books on mathematical and theoretical statistics such as that by Casella and Berger (2002).
In my view, the current textbook trend to provide “readable” data analysis texts to students outside of the mathematically dense sciences has reached its limit. Books now exist on statistical topics that attempt to use virtually no formulae or symbols whatsoever. I find this to be unfortunate, if not somewhat ridiculous, just as I equally find the abuse of mathematical complexity for its own sake rather distasteful. Indeed, being technical and complex for its own sake does little for the student attempting to grasp difficult concepts rather than simply memorizing equations. As Kline (1977) noted with regard to teaching calculus, rigor, while ultimately required, can too often obscure that thing we call understanding. Stewart (1995) said the same thing: “The psychological is more important than the logical … Intuition should take precedence; it can be backed up by formal proof later” (p. 5).
What has always intrigued me, however, is how little social science students, aside from those perhaps in economics, are exposed to even elementary mathematics in their coursework. Even courses in statistics for social scientists generally de‐emphasize the use of mathematics. I believe there are two reasons for this trend. First, mathematical representation in these disciplines has a reputation for being either “mysterious” or otherwise “beyond the grasp” of students. Students shy away from equations, and, for good reason, they can be difficult to understand and are difficult to manipulate. Except for the gifted few, we all struggle. But to think of them as mysterious or beyond anyone’s grasp is simply wrong. Second, the communication and writing of mathematics generally lacks clarity and that philosophical “touch” when the teaching of it is attempted. Nobody likes to see one equation followed by another without understanding what “happened” in between, and even more importantly, why it happened. The proverbial sigh of outward and seemingly innate and unforgiving disappointment displayed toward any student who should ask why, is, of course, no service to the student either.
I do not believe most students dislike mathematics. I do believe, however, that most students dislike mathematics that are unclear, poorly communicated, or otherwise purposely cryptic. In this book, I go to somewhat painstaking efforts to explain technical information in as clearly and in as expository fashion as possible. In this spirit, I was largely inspired by A.E. Labarre’s text Elementary Mathematical Analysis published in 1961. It is as exceptionally clear elementary‐to‐moderate level mathematics text that you will ever find and is a perfect demonstration of how technical information can be communicated in a clear, yet still technically efficient manner. Once more, the reader will be the final judge on whether or not this clarity of exposition has been achieved.
I have always found that learning new statistical techniques without consulting the earliest of historical sources on those techniques a rather shallow and hollow experience. Yes, one could read a book on the how and why of factor analysis, for instance, but it is only through consulting the earliest of papers and derivations that one begins to experience a deeper understanding. Nothing compares to studying the starting points, the original manuscripts. It has always also intrigued me that one can claim to understand regression, for instance, yet not have ever heard of Francis Galton. How can one understand regression without even a cursory study of its historical roots? Of course, one can, but I believe a study of its history contributes more of an impression of the technique than is possible otherwise. A study of early plots that featured the technique (see Figure 1) and the context in which the tool came about, I believe can only promote a deeper understanding of concepts in students.
FIGURE 1 One of Galton’s early graphical illustrations of regression. Circles in the plot are average heights for subgroups of children as a function of mid‐parent height. The lines AB and CD are regression lines..
(Galton, 1886)
A priority of the book is to introduce students to these methods by often providing a glimpse into their historical beginnings, or at minimum, providing some discussion of its origination. Historically relevant data are also used in places, whereas other parts of the book feature hypothetical and very “easy” data. And though demonstrating techniques by referring to substantive applications is always a good idea, it is equally useful to demonstrate methods using “generic” variables (e.g., x1x2) to encourage an understanding of what the technique is actually doing as distinct from the substantive goals of the investigation. Researchers in applied fields can sometimes get overly “immersed” in their theories such that next to their significant other, their theory is their greatest love. Over‐focus on applications can prevent the student from realizations of the kind that factor analysis, for instance, does not “discover” anything. It merely models correlation. It is often useful in this regard to retreat from substantive considerations and simply focus on the mechanics lest we conclude more from the software output than is warranted by the quantitative analysis. A course in statistical methods should be just as much about what statistics cannot do as it should be about what they can do. Many students significantly overestimate the power of the tool.
There is another reason for the focus somewhat on historical papers. Though the history of statistics by no means constitutes “easy reading,” I have found that papers written by the inventors of these techniques are often as clear and readable as anything written since. As a mentor of mine once told me, if you want to know what the inventor meant, read what they wrote, not what others wrote about what they wrote. For instance, papers by George Udny Yule on multiple regression or Fisher (Figure 2) on the analysis of variance or discriminant analysis, or more modernly Karl Jöreskog on covariance modeling are some of the most well‐written (if not still quite difficult to read and understand) papers on these topics. Many of these authors had a clarity about their writing that surpasses much that has been written since. It is always a pleasure to dig into the historical papers and uncover exactly what the originators wrote about their methodologies rather than rely only on interpretations of their original works.
FIGURE 2 R.A. Fisher in 1912, usually heralded as undisputed father of modern statistics.
For histories of mathematical statistics, Stigler (1986) and Hald (1998) provide exceptional coverage. For a history of probability before 1750, see Hald (1990). Desrosières (1998) provides an excellent social history. Indeed, statistics as a tool of socio‐political persuasion is a field in its own right (MacKenzie, 1981). For a history of statistics in psychology, see Cowles (2005). Gigerenzer et al. (1990) give an excellent historical account of the influence of statistical science on modern culture. Salsburg (2002) provides a light, engaging, and enjoyable narrative on the general history of statistics.
Most textbooks on applied statistics essentially assume that a researcher’s top priority in his or her career, other than getting tenure, is to reject a null hypothesis at a significance level of 0.05. Results attaining p‐values of 0.06, 0.07, and higher are usually deemed “insignificant” and hence not grounds for rejecting a null. This constitutes a huge problem with how the logic of statistics, or we shall say its philosophy, is taught to students of science. It completely ignores the big picture. Setting habitual significance levels such as 0.01 and 0.05 is fine statistically, but methodologically (read scientifically) makes no sense to adopt such rigid and fixed decision criteria across all experimental or non‐experimental contexts. This is especially the case if one cares at all about costs of making a bad decision on the “other” end of the curve, the too often ignored type II error rate.
As a reader of statistics texts, I have fettered a methodological annoyance at continually seeing null hypotheses either rejected or retained without seemingly any consideration of the costs of committing a type II error. Surely, if the world population is being eliminated day and night by a super bug, tests on potential vaccines would make their way into the public even at significance levels of 0.10. My point is that researchers and scientists should not be interpreting a significance level as would a strict mathematical or theoretical statistician. Setting a significance level is setting a decision rule, and in line with good statistical theory, is the way to proceed. However, when confronted with real data, with real decisions, with real life, adhering day and night to an arbitrary significance level while simultaneously pretending to make intelligent decisions about scientific phenomena or courses of action, is, well, foolish. A p‐value is to be used as an input to the decision‐making process, and the level of significance for a particular test should depend on a consideration, even if informal, of both error rates. Further, good decisions require more than the consideration of p‐value magnitudes. Measures of effect size are just as, if not more, important. I try my best to emphasize these issues as they arise throughout the book.
Though this text is not even remotely one about multilevel or hierarchical modeling, a unique feature of this book is that it introduces more complex mixed modeling in a gentle manner and as an extension of already‐learned principles. I have found it bewildering to see introductions to multilevel and hierarchical modeling that make “mention” of mixed models only in passing, as though a mixed model is simply another name to describe such methods. Of course, mixed modeling is not simply “another name” that can be given to multilevel or hierarchical models. Multilevel and hierarchical models are, in most contexts, ideally considered as special cases of the more general mixed model.
One strength of this book compared to competing books, I believe, and hope, is readability, though not at the expense of complexity. Technical complexity is a necessary evil of mathematical and statistical writing. It cannot be avoided, and must be embraced to some extent if one is to deepen one’s understanding of statistics and methodology. It is my opinion that attempts to avoid symbolism when writing on statistical topics do more harm than good. Students come away having learned, for instance, that correlation is a measure of linear relationship, and nothing more. But what is correlation? It is much more than a phrase. To really understand it, one must work with the symbolic representation of it. Analogously, statements such as “Alcoholism predicts suicide” are only as meaningful as one is aware of what prediction means, not in their sense of the word, but rather how it is defined and used mathematically upon which the research report is based. The students and researchers who make such claims have a responsibility to understand the claims they are making at a somewhat technical level, otherwise, the statement is hollow. Mathematics (and associated symbols) is the language of science, and the sooner the student of the social sciences accepts this and plunges into the battle with two feet, the sooner life gets easier, not harder.
Most definitions are given at a level that is conceptually clear, with at least some respect to formality. For example, there is typically only one concept of a “random variable,” and yet it can be defined in many ways. That it is a real‐valued function defined on a sample space (Degroot and Schervish, 2002), precise as this may be, can nonetheless be “translated” into that of a variable, the values of which occur according to some specified probability distribution (Everitt, 2002). Both are correct, although a reader with a background in measure theory would still likely find both definitions “incomplete.” The irk of mathematical and theoretical statisticians, of course, is that definitions get “translated” so much that they lose their formal and precise‐to‐the‐nth‐degree meaning. Every effort in this book has been made to communicate concepts and theory conceptually, however without “insulting” the more formal, deeper (and more correct) definitions. The discipline of theoretical and mathematical statistics is a treasured area of investigation, and definitions in this book, even if given at a relatively informal level, are hoped nonetheless to show due respect to the aforementioned field. Indeed, in many places, we cite such definitions and then proceed to “unpack” their meaning at a more conceptual level.
The book does not pretend to cover every available methodology or survey every type of statistical technique. For example, there are no chapters on time series or survival analysis. The book was built by standing on the shoulders of other books that I consider to be the very best in the fields of statistics, social science, and applied data analysis more generally. Indeed, some of the more technical arguments follow rather closely in the footsteps of such “giants.” Proper credit and citation, of course, is noted where appropriate. What I hope makes this book unique however is in how the material is explained and communicated to the student, with an emphasis on conceptual development without a complete disregard for technical accuracy. As Kirk (1995) remarked, both can be achieved.
Modern applied social statistics has virtually exploded in complexity with the advent of scientific computing. The so‐called soft sciences are statistically quite hard, and the number of specialized techniques and their offshoots far eclipses my ability (or willingness, drive, or emotional desire) to cover them all in a single text. For instance, advances on techniques with ordinal data alone could easily take up an entire book, yet in this book, they are not discussed. Instead of trying to cover too much ground too fast, I focus instead on the fundamentals, the logic, the “gateways” to understanding the “harder” stuff that lay ahead for the student in future courses, seminars, and books. Understanding foundations I believe is the key. Kindergarten, after all, is the most important grade.
For instance, it is my belief that if a student has a solid understanding of what analysis of variance (ANOVA) and regression actually are beyond memorization and formula manipulation, this puts the student in an ideal position to extend that knowledge to virtually any statistical model he or she chooses to master in the future. If I had attempted to include every statistical methodology under the sun, the book would read like a cookbook, and though it would be of use to the experienced researcher, it would be of minimal use to the newcomer wanting to grasp the essential logic of these methodologies. The unfortunate reality is that too often students who “do statistics” have too little of an idea of what they are doing. This is both the fault of the student for not committing themselves to a better understanding of the tools they use, but also the fault of instructors who too often teach “computer statistics” rather than