79,99 €
Guides readers through the quantitative data analysis process including contextualizing data within a research situation, connecting data to the appropriate statistical tests, and drawing valid conclusions Introduction to Quantitative Data Analysis in the Behavioral and Social Sciences presents a clear and accessible introduction to the basics of quantitative data analysis and focuses on how to use statistical tests as a key tool for analyzing research data. The book presents the entire data analysis process as a cyclical, multiphase process and addresses the processes of exploratory analysis, decision-making for performing parametric or nonparametric analysis, and practical significance determination. In addition, the author details how data analysis is used to reveal the underlying patterns and relationships between the variables and connects those trends to the data's contextual situation. Filling the gap in quantitative data analysis literature, this book teaches the methods and thought processes behind data analysis, rather than how to perform the study itself or how to perform individual statistical tests. With a clear and conversational style, readers are provided with a better understanding of the overall structure and methodology behind performing a data analysis as well as the needed techniques to make informed, meaningful decisions during data analysis. The book features numerous data analysis examples in order to emphasize the decision and thought processes that are best followed, and self-contained sections throughout separate the statistical data analysis from the detailed discussion of the concepts allowing readers to reference a specific section of the book for immediate solutions to problems and/or applications. Introduction to Quantitative Data Analysis in the Behavioral and Social Sciences also features coverage of the following: * The overall methodology and research mind-set for how to approach quantitative data analysis and how to use statistics tests as part of research data analysis * A comprehensive understanding of the data, its connection to a research situation, and the most appropriate statistical tests for the data * Numerous data analysis problems and worked-out examples to illustrate the decision and thought processes that reveal underlying patterns and trends * Detailed examples of the main concepts to aid readers in gaining the needed skills to perform a full analysis of research problems * A conversational tone to effectively introduce readers to the basics of how to perform data analysis as well as make meaningful decisions during data analysis Introduction to Quantitative Data Analysis in the Behavioral and Social Sciences is an ideal textbook for upper-undergraduate and graduate-level research method courses in the behavioral and social sciences, statistics, and engineering. This book is also an appropriate reference for practitioners who require a review of quantitative research methods. Michael J. Albers, Ph.D., is Professor in the Department of English at East Carolina University. His research interests include information design with a focus on answering real-world questions, the presentation of complex information, and human-information interaction. Dr. Albers received his Ph.D. in Technical Communication and Rhetoric from Texas Tech University.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 347
Veröffentlichungsjahr: 2017
Cover
Title Page
Copyright
Preface
About the Companion Website
Chapter 1: Introduction
Basis of How All Quantitative Statistical Based Research
Data Analysis, Not Statistical Analysis
Quantitative Versus Qualitative Research
What the Book Covers and What It Does Not Cover
Book Structure
References
Part I: Data Analysis Approaches
Chapter 2: Statistics Terminology
Statistically Testing a Hypothesis
Statistical Significance and p-Value
Confidence Intervals
Effect Size
Statistical Power of a Test
Practical Significance Versus Statistical Significance
Statistical Independence
Degrees of Freedom
Measures of Central Tendency
Percentile and Percentile Rank
Central Limit Theorem
Law of Large Numbers
References
Chapter 3: Analysis Issues and Potential Pitfalls
Effects of Variables
Outliers in the Dataset
Relationships Between Variables
A Single Contradictory Example Does Not Invalidate a Statistical Relationship
References
Chapter 4: Graphically Representing Data
Data Distributions
Bell Curves
Skewed Curves
Bimodal Distributions
Poisson Distributions
Binomial Distribution
Histograms
Scatter Plots
Box Plots
Ranges of Values and Error Bars
References
Chapter 5: Statistical Tests
Inter-Rater Reliability
Regression Models
Parametric Tests
Nonparametric Tests
One-Tailed or Two-Tailed Tests
Tests Must Make Sense
References
Part II: Data Analysis Examples
Chapter 6: Overview of Data Analysis Process
Know How to Analyze It Before Starting the Study
Perform an Exploratory Data Analysis
Perform the Statistical Analysis
Analyze the Results and Draw Conclusions
Writing Up the Study
References
Chapter 7: Analysis of a Study on Reading and Lighting Levels
Lighting and Reading Comprehension
Know How the Data Will Be Analyzed Before Starting the Study
Perform an Exploratory Data Analysis
Perform an Inferential Statistical Analysis
Exercises
Chapter 8: Analysis of Usability of an E-Commerce Site
Usability of an E-Commerce Site
Study Overview
Know How You Will Analyze the Data Before Starting the Study
Perform an Exploratory Data Analysis
Perform an Inferential Statistical Analysis
Follow-Up Tests
Performing Follow-Up Tests
Exercises
Reference
Chapter 9: Analysis of Essay Grading
Analysis of Essay Grading
Exploratory Data Analysis
Inferential Statistical Data Analysis
Exercises
Reference
Chapter 10: Specific Analysis Examples
Handling Outliers in the Data
Floor/Ceiling Effects
Order Effects
Data from Stratified Sampling
Missing Data
Noisy Data
Transform the Data
References
Chapter 11: Other Types of Data Analysis
Time-Series Experiment
Analysis for Data Clusters
Low-Probability Events
Metadata Analysis
Reference
Appendix A: Research Terminology
Independent, Dependent, and Controlled Variables
Between Subjects and Within Subjects
Validity and Reliability
Variable Types
Type of Data
Independent Measures and Repeated Measures
Variation in Data Collection
Probability—What 30% Chance Means
References
Index
End User License Agreement
Table A.1
Table A.2
Table A.3
Table 1.1
Table 1.2
Table 2.1
Table 2.2
Table 2.3
Table 2.4
Table 2.5
Table 2.6
Table 2.7
Table 4.1
Table 5.1
Table 5.2
Table 5.3
Table 5.4
Table 5.5
Table 5.6
Table 6.1
Table 7.1
Table 7.2
Table 7.3
Table 8.1
Table 8.2
Table 8.3
Table 8.4
Table 8.5
Table 8.6
Table 8.7
Table 8.8
Table 8.9
Table 9.1
Table 9.2
Table 9.3
Table 10.1
Figure A.1
Figure A.2
Figure 1.1
Figure 1.2
Figure 2.1
Figure 2.2
Figure 2.3
Figure 2.4
Figure 2.5
Figure 2.6
Figure 2.7
Figure 2.8
Figure 2.9
Figure 2.10
Figure 2.11
Figure 2.12
Figure 3.1
Figure 3.2
Figure 3.3
Figure 3.4
Figure 3.5
Figure 3.6
Figure 3.7
Figure 4.1
Figure 4.2
Figure 4.3
Figure 4.4
Figure 4.5
Figure 4.6
Figure 4.7
Figure 4.8
Figure 4.9
Figure 4.10
Figure 4.11
Figure 4.12
Figure 4.13
Figure 4.14
Figure 4.15
Figure 4.16
Figure 4.17
Figure 4.18
Figure 5.1
Figure 5.2
Figure 5.3
Figure 5.4
Figure 6.1
Figure 7.1
Figure 7.2
Figure 7.3
Figure 7.4
Figure 7.5
Figure 7.6
Figure 7.7
Figure 7.8
Figure 7.9
Figure 7.10
Figure 7.11
Figure 8.1
Figure 8.2
Figure 8.3
Figure 8.4
Figure 8.5
Figure 8.6
Figure 8.7
Figure 8.8
Figure 8.9
Figure 8.10
Figure 8.11
Figure 9.1
Figure 9.2
Figure 9.3
Figure 9.4
Figure 9.5
Figure 9.6
Figure 9.7
Figure 9.8
Figure 9.9
Figure 9.10
Figure 9.11
Figure 9.12
Figure 9.13
Figure 10.1
Figure 10.2
Figure 11.1
Figure 11.2
Figure 11.3
Figure 11.4
Figure 11.5
Cover
Table of Contents
PrefaceReferences
Part 1
Chapter 1
i
ii
iii
iv
ix
x
xi
xii
xiii
xiv
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
Michael J. AlbersEast Carolina University
This edition first published 2017© 2017 John Wiley & Sons, Inc.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.
The right of Michael J. Albers to be identified as the author(s) of this work has been asserted in accordance with law.
Registered OfficesJohn Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA
Editorial Office111 River Street, Hoboken, NJ 07030, USA
For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.
Wiley also publishes its books in a variety of electronic formats and by print-on-demand. Some content that appears in standard print versions of this book may not be available in other formats.
Limit of Liability/Disclaimer of WarrantyThe publisher and the authors make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties; including without limitation any implied warranties of fitness for a particular purpose. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for every situation. In view of on-going research, equipment modifications, changes in governmental regulations, and the constant flow of information relating to the use of experimental reagents, equipment, and devices, the reader is urged to review and evaluate the information provided in the package insert or instructions for each chemical, piece of equipment, reagent, or device for, among other things, any changes in the instructions or indication of usage and for added warnings and precautions. The fact that an organization or website is referred to in this work as a citation and/or potential source of further information does not mean that the author or the publisher endorses the information the organization or website may provide or recommendations it may make. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this works was written and when it is read. No warranty may be created or extended by any promotional statements for this work. Neither the publisher nor the author shall be liable for any damages arising here from.
Library of Congress Cataloguing-in-Publication Data applied for.Hardback: 9781119290186
Cover image: Magnilion/gettyimagesCover design by Wiley
This book strives to be an introduction to quantitative data analysis for students who have little or no previous training either in statistics or in data analysis. It does not attempt to cover all types of data analysis situations, but works to impart the proper mindset in performing a data analysis. Too often the problem with poorly analyzed studies is not the number crunching itself, but a lack of the critical thinking process required to make sense of the statistical results. This book works to provide some of that training.
Statistics is a tool. Knowing how to perform a t-test or an ANOVA is similar to knowing how to use styles and page layout in Word. Just because you know how to use styles does not make you a writer. It will not even make you a good layout person if you do not know when and why to apply those styles. Likewise, statistics is not data analysis. Learning how to use a software package to perform a t-test is relatively easy and quick for a student. But knowing when and why to perform a t-test is a different, and more complex, learning outcome. I had a student, who had taken two graduate-level business statistics courses, remark when she turned in a statistics heavy report in a writing class: “In the stat classes, I only learned enough to get me through the test problems. I have no idea how to analyze this data.” She had learned how to crunch numbers, but not how to analyze data. Bluntly, she wasted her time and money in those two classes.
The issue for researchers in the social sciences is not to learn statistics, but learn to analyze data. The goal is not to learn how to use the statistical tests to crunch numbers, but to be able use those tests to interpret the data and draw valid conclusions from it. There is a wide range of statistical tests relevant to data analysis; some that every researcher should be able to perform and some that require the advice/help of a statistical expert. Good quantitative data analysis does not require a comprehensive knowledge of statistics, but, rather, knowing enough to know when it is time to ask for help and what questions to ask.
Every quantitative research study (essentially by definition) collects some type of data that must then be analyzed to help draw the study's conclusions. A great study design is useless unless the data is properly analyzed. But teaching that data analysis to students is a difficult task. What I have found is that most textbooks fall into one of these categories.
Research method textbooks that explain how to create and execute a study, but typically are very light on how to analyze the data. They are excellent on explaining methods of setting up the study and collecting the data, but not on the methods to analyze it after it has been collected.
Statistics textbooks that explain how to perform statistical tests. The tests are explained in an acontextual manner and in rigorous statistical terms. Students learn how to perform a test, but, from a research standpoint, the equally important questions of when and why to perform it get short shrift. As do the questions of how to interpret the results and how to connect those results to the research situation.
This book differs from textbooks in these two categories because it focuses on teaching how to analyze data from a study, rather than how to perform a study or how to perform individual statistical tests. Notice that in the first sentence of a previous paragraph I said “data that must then be analyzed to help draw the study's conclusions.” The key word in the sentence is help versus give the study's conclusions. The results of statistical tests are not the final conclusions for research data analysis. The researcher must study the test results, apply them to the situational context, and then draw conclusions that make sense (see Figure 1.1 in Chapter 1). To support that process, this book works to place statistical tests within the context of a data analysis problem and provide the background to connect a specific type of data with the appropriate test. The work is placed within long examples and the entire process of data analysis is covered in a contextualized manner. It looks at the data analysis from different viewpoints and using different tests to enable a student to learn how and when to apply different analysis methods.
Two major goals are to teach what questions to ask during all phases of a data analysis and how to judge the relevance of potential questions. It is easy to run statistical tests on all combinations of the data, but most of those tests have no relevance or validity regardless of the actual research question.
This book strives to explain the when, why, and what for, rather than the button pushing how-to. The data analysis chapters of many research textbooks are little more than an explanation of various statistical tests. As a result, students come away thinking the important questions are procedural, such as: “How do I run a chi-squared test?” “What is the best procedure, a Kruskal–Wallis test or a standard ANOVA?” and “Let me tell you about my data, and you can tell me what procedure to run.” (Rogers, 2010, p. 8). These are the wrong questions to be asking at the beginning of a data analysis. Rather, students need to think along the lines of “what relationships do I need to understand?” and “what are the important practical issues I need to worry about?” Unfortunately, most data analysis texts get them lost is the trees of individual tests and never explains where they are within a data analysis forest.
Besides knowing when and why to perform a statistical test, there is a need for a researcher to get at the data's deep structure and not be content with the superficial structure that appears at first glance. And certainly not to be content with poor/inadequate data analysis in which the student sees the process as “run a few statistics tests, report the p-value, and call the analysis complete.”
Statistics is a tool to get where you want to go, but far too many view it either as an end for itself and the rest view it as a way of manipulating raw data in order to get a justification for what they want to do to begin with. Further, being able to start to quantify relationships and being able to quantify results does not mean that you are beginning to understand these, let alone being able to quantify anything like the risk involved
(Briggs, 2008).
I recently had to review a set of undergraduate honors research project proposals; they consistently had several weeks scheduled for data collection and one week for data analysis. Unfortunately, with only 1 week, these students will never get more than a superficial level of understanding of their data. In many of the cases of superficial analysis, I am more than willing to place a substantial part of the blame on the instructor. There is a substantial difference between a student who chooses to not to do a good data analysis and a student who does not know how to do a good data analysis. Unless students are taught how to perform an in-depth analysis, they will never perform one because they lack the knowledge. More importantly, they will lack the understanding to realize their analysis was superficial. If someone was taught the task as “do a t-test and report a p-value,” then who is to blame for the lack of data analysis knowledge?
A goal of this book is to teach that data analysis is not just crunching numbers, but a way of thinking that works to reveal the underlying patterns and trends that allow a researcher to gain an understanding of the data and its connection to the research situation. I am content with students knowing when and why to use statistical tests, even if the test's internal logic is little more than a black box.
I expect many research methods instructors will be appalled at this book's contents. The heavy statistics-based researcher or a statistics instructor will be appalled at the statistical tests I left out or at the lack of rigorous discussion of many concepts. The instructor who touches on statistics in a research methods course will be appalled at the number of tests I include and the depth of the analysis. (Yes, I fully appreciate the inherent contradiction in these two sentences.) But I sincerely hope both groups appreciate my attempt at defining statistical tests as a part of data analysis—NOT as either its totality or its end— and my goal of teaching students to approach a data analysis with a mind-set of that they must analyze the data and not simply run a bunch of statistical tests.
With that said, here are some research issues this book will not address:
This book assumes the research methodology and data collection methods are valid. For instance, some examples discuss how to analyze the results of survey questions using Likert scales. Neither the design of the survey question or the developments of Likert items will be discussed; they are assumed to be valid.
This book assumes the data's reliability and validity. The reliability and validity of the data are research design questions that a well-designed study must consider up front, but they do not affect the data analysis per sec. Obviously, with poor quality data, the conclusions are questionable, but the analysis process does not change.
There are no step-by-step software instructions. There are several major statistical software packages and a researcher might use any one of them. With multiple packages, detailed-level software instructions would result in an overly long book with many pages irrelevant to any single reader. All the major software packages provide all of the basic tests covered in this book and there are essentially an infinite number of help sites and YouTube videos that explain the button-pushing aspects. Plus, the how-to is much more effectively taught one-on-one with an instructor than from a book.
The basic terminology used in research study design is used with minimal definition. For example, if the analysis differs for within subjects and between subject's designs, the discussion assumes the student already understands the concepts of within subjects and between subjects, since those must be understood before collecting data. Terminology relevant to a quantitative analysis will, of course, be full defined and explained. Also, there are extensive references to definitions and concepts.
There is no attempt to cover statistical proofs or deal with the edge cases of when a test does or does not apply. Readers desiring that level of understanding need a full statistics course. There are many places where I refer the researcher to a statistician. The complexities of much statistics or delving into more advanced tests may be relevant to the research, but are out of place here. This book is an introduction to data analysis, not an exhaustive data analysis tome.
This book focuses on the overall methodology and research mind-set for how to approach quantitative data analysis and how to use statistics tests as part of analyzing research data. It works to show that the goal of data analysis is to reveal the underlying patterns, trends, and relationships between the variables, and connecting those patterns, trends, and relationships to the data's contextual situation.
Briggs, W. (2008) The limits of statistics: black swans and randomness [Web log comment]. Retrieved from
http://wmbriggs.com/blog/?p=204
.
Rogers, J.L. (2010) The epistemology of mathematical and statistical modelling: a quiet methodological revolution.
American Psychologist
,
65
(1), 1–12.
This book is accompanied by a companion website:
http://www.wiley.com/go/albers/quantitativedataanalysis
The website includes:
Excel data sets for the chapter problems
Any research study should have a solid design, properly collected data, and draw its conclusions on effectively analyzed data. All of which are nontrivial problems.
This is a book about performing quantitative data analysis. Unlike most research methods texts, which focus on creating a good design, the focus is on analyzing the data. It is not on how to design the study or collect the data; there many good sources that cover those aspects of research. Of course, poor designs or data collections lead to poor data that means the results of the analysis are useless. Instead, this book focuses on how to analyze the data.
The stereotypical linear view of a research study is shown in Figure 1.1a. Figure 1.1b expands on what is contained within the “analyze data” element. This book only works within that expansion; it focuses on how to analyze data from a study, rather than either how to perform the study or how to perform individual statistical tests.
Figure 1.1 View of data analysis as situation within the overall study.
The last two boxes of the expansion in Figure 1.1 “Make sense of the results” and “Determine the implications” are where performing a high-quality data analysis differs from someone simply crunching numbers.
A quantitative study is run to collect data and draw a numerical-based conclusion about that data. A conclusion that must reflect both the numerical analysis and the study context. Thus, data must be analyzed to help draw a study's conclusions. Unfortunately, even great data collected using a great design will be worthless unless the analysis was performed properly. The keyword in the sentence is help versus give the study's conclusions. The results of statistical tests are not the final conclusion for research data analysis. The researcher must study the test results, apply them to the situational context, and then draw conclusions that make sense. To support that process, this book works to place the tests within the context of a problem and provide the background to connect a specific type of data with the appropriate test.
The outcome of any statistical analysis needs to be evaluated in terms of the research context and any conclusions drawn based on that context.
Consider this example of how this book approaches data analysis.
You are interested in which books are being checked out of a library. So, you gather data using many titles that fit within study-defined categories. For example, topical nonfiction or categories for fiction of a particular genre (historical, romance, etc).
At the end of the study's data collection, the analysis looks at the following:
Graphs of checkouts by month of the various categories. Do the types of categories vary by day/week through the month? How do the numbers compare? Do the trends of checkouts for each category look the same or different?
Run statistics on the daily/month checkouts of the book categories versus demographics of the people who checked them out (age, gender, frequency of library use, etc.). Does age or gender matter for who checks out a romance versus a thriller. From this we can find whether there is a statistically significant difference (e.g., that older readers read more romance than younger readers).
Too many people believe if they can figure out how to run statistical software, then they know how to perform a quantitative data analysis. No! Statistics is only a single tool among many that are required for a data analysis. Likewise, the software is only a tool that provides an easy way to perform a statistical test. Knowing how to perform a t-test or an ANOVA is similar to knowing how to use styles and page layout in Word. Just because you know how to use styles does not make you a writer. It will not make you a good layout person if you do not know when and why to apply those styles. Neither the software nor the specific tests themselves are sufficient; necessary, yes, but sufficient, no! Run the wrong test, and the results are wrong. Fail to think through what the statistical test means to the situation and the overall study fails to have relevance.
It is important to understand that statistics is not data analysis. Learning how to use a software package to perform a t-test is relatively easy and quick. But good data analysis requires knowing when and why to perform a t-test; a much more different, and complex task. Especially for researchers in the social sciences, the goal is not to be a statistical expert, but to know how to analyze data. The goal is to be able to use statistical tests as part of the input required to interpret the study's data and draw valid conclusions from it. There is a wide range of statistical tests relevant to data analysis; some that every researcher should be able to perform and some that require the advice/help of a statistical expert. Good quantitative data analysis does not require a comprehensive knowledge of statistics, but, rather, knowing enough to know when it is time to ask for help and what questions to ask. Many times throughout the book, the comment to consult a statistician appears.
Figure 1.1 shows data analysis as one of five parts of a study; a part that deserves and often requires 20% of the full study's time. I recently had to review a set of undergraduate honors research project proposals; they consistently had several weeks scheduled for data collection, a couple of weeks for data clean up, and data analysis was done on Tuesday's. This type of time allocation is not uncommon for young researchers, probably based on a view that the analysis is just running a few t-tests and/or ANOVAs on the data and copying the test output into the study report. Unfortunately, with that sort of analysis, the researchers will never reach more than a superficial level of understanding of the data or be able to draw more than superficial conclusions from it.
The purpose of a quantitative research study is to gain an understanding of the research situation. Thus, the data analysis is the study; the study results come directly out of the analysis. It is not the collection and not the reporting; without the data analysis there is no reason to collect data and there is nothing of value to report.
There are many dedicated statistical software programs (JMP, SPSS, R, Minitab) and many others. When you are doing data analysis, it is important to take the time to learn how to use one of these packages. All of them can perform all the standard statistical tests and the nonstandard tests, while important in their niche case, are not needed for most data analysis.
The one statistical source missing from the list is Microsoft Excel. This book uses Excel output on many examples, but it lacks the horsepower to really support data analysis. It is great for data entry of the collected data and for creating the graphs of the exploratory analysis. But, then, move on to a higher powered statistical analysis program.
In statistics the word “significance” is often used to mean “statistical significance,” which is the likelihood that the difference between the two groups is just an accident of sampling. A study's data analysis works to determine if the data points for two different groups are from the same population (a finding of not statistically significance) or if they are from different populations (a finding of statistically significance).
Every population has a mean and standard deviation. However, those values are typically not known by the researcher. Part of the study's goals is to determine them. If a study randomly selected members from the population in Table 1.1 any of those four groups could be picked.
Table 1.1 Random numbers generated with a normal distribution of mean = 10 and SD = 2.
Trial 1
Trial 2
Trial 3
Trial 4
Data points
8.043
7.726
10.585
7.679
7.284
7.374
9.743
12.432
11.584
11.510
9.287
13.695
9.735
11.842
9.102
8.922
8.319
9.651
4.238
6.525
8.326
11.849
11.193
11.959
Mean
8.882
9.992
9.025
10.202
SD
1.544
2.063
2.475
2.891
Because of the nature of random numbers and small sample sizes, each trial has a different mean and standard deviation, although they all come from the same population.
If you take multiple samples from the same population, there will always be a difference between them. Table 1.1 shows the results of Excel calculating six random numbers that fit a normal distribution with a mean = 10 and standard deviation = 2. The numbers were generated randomly, but they could reflect the data from any number of studies: time to perform a task, interactions during an action, or, generally, anything that can be measured that gives a normal distribution. The important point here is that although they all come from the same population, each sample's mean and standard distribution is different.
Now put those numbers into a study. We have a study with two groups that are looking at the time to perform a task (faster is good). We have old way (performed by people that resulted in the times of trial 1: mean = 8.882, SD = 1.544) and a new way (performed by people that resulted in the times of trial 2: mean = 9.992, SD = 2.063). Because Table 1.1 presents data without context, a simple t-test (or ANOVA) to determine significance is all we can perform. If we simply looked at the mean and standard deviation numbers, it would seem trial 2 was worst, assuming we wanted fast task times. Yet, they are both random sets of numbers from mean = 10 and SD = 2. A proper statistical analysis would return a result of not statistically significance differences.
Within the context of a study, the data analysis requires moving beyond calculating a statistical value (whether a p-value or some of the results available with more complex statistical methods) and interpreting that statistical result with respect to the overall context. The research also has to determine if the difference is enough to have practical significance. From a practical viewpoint, there would be no reason to spend money to use the new way, since it is not any faster than the old way. If they had been significantly different, then the conclusions have to consider other factors (such as time and expense of the change) to determine if it is worthwhile. Quantitative data analysis is more than just finding statistical significance; it is connecting the results of the statistical analysis with the study's context and drawing practical conclusions.
Statistical significance is usually calculated as a “p-value,” the probability that a difference of at least the same size would have arisen by chance, even if there really were no difference between the two populations. By social science convention, if p < 0.05 (i.e., below 5%), the difference is taken to be large enough to be “significant”; if not, then it is “not significant.” In other words, if p < 0.05, then the two sets of data are declared to be from different populations. In the hard sciences, the p-value must be smaller.
Research reports often contain sentences such as “The p-value of 0.073 shows the results are trending toward significance.” Researchers have long running debate about this type of wording. The basis of the argument is that a result is yes/no: it is either significant or it is not significant.
However, the definition of significance as p = 0.05 is itself arbitrary and is only a long-standing convention in the social sciences.
More importantly, thinking in terms of a statistical yes/no ignores the study context. The effect size (loosely defined as the practical significance) is more important in the final result than the statistical significance. If a study fails to show statistical significance—in this case, p = 0.073—but has a large effect size, the results are much strong than one with the same p-value (or even p = 0.04 that is significant) but a small effect size.
Both p-value and the effect size should be reported in a study's report.
Some researchers try to skip the statistical analysis and observe and draw conclusions directly from the data. Looking at the data in Table 1.1 without a statistical analysis might conclude that trials 1 and 4 were from different populations, but this is not the case. If this was a study to determine if a current practice should change, lots of money could be spent on a change that will have no real affect. Without a proper analysis, the researcher can fall prey to confirmation bias. A confirmation bias occurs when the data to support a desired claim is specifically looked for and data that refutes the claim is ignored. A simple example is politicians from opposing parties using the same report to claim their agenda is right and the opposing agenda is wrong. Yet, both are cherry picking data from the report to make that claim.
Although statistics can tell you the data are from different populations (there is a statistical significant difference), it does not tell you the why.” Yet, the purpose of a research study is to uncover the why, not just the proof of a difference. Thus, a statistical analysis itself is not the answer to a hypothesis. A research study needs to move beyond the statistics and answer questions of why and what really happened to give these results, and then move past those questions to figure out what the answers mean in the study's context. That is the outcome of a good data analysis.
Early in a study's design, a set of hypothesis is created. Then the data needed to test those hypotheses are defined and eventually collected. At the same time the needed data is defined, the analysis tests to be performed on the data should be defined.
Good data analysis knows from the start how the first round of analysis will be performed; that was defined early in the study design. The second (and other) rounds of analysis that each drill deeper and explore interesting relationships found in the previous rounds. Obviously these cannot be defined until you are engaged in the analysis, but understanding how to pursue them distinguishes a poor from a good researcher (Figure 1.2).
Figure 1.2 Cyclical nature of data analysis.
Poor data analysis often just collects data and then runs all possible combinations of data, most of which make no sense to even test against each other. The goal of the data analysis has been shifted from understanding the data within the study to “finding significance…any significance.” Unfortunately, with significance defined as p < 0.05 that means 5% of the combinations may show significance when it does not exist.
The best, most well-designed study is worthless if the data analysis is inadequate. Focus the analysis on a small number of well-conceived hypotheses rather than blindly performing different statistical tests on all variable pairs and ending up with 5% of your results being significant at the 0.05 level.
Research methodology textbooks call for qualitative research to get a view of the big picture and then to use quantitative studies to examine the details. Both quantitative and qualitative have their strong and weak points, and a good research agenda requires both. That implies that a researcher needs to understand both. A problem occurs—similar to the “when you have a hammer, everything looks like a nail” problem—when researchers lack training in one, typically quantitative, and try to use a single approach for all research problems.
Quantitative research is a methodical process. Too many people with qualitative research experience—or who lack quantitative research experience— look at a situation, point out all of the interacting factors, and despair (or claim an impossibility) of figuring out the situation.
The social sciences work with large complex systems with numerous variables interacting in subtle ways. The reality is that it requires many studies with each looking at the situation in slightly different ways and with each study exposing new questions and new relationships. Qualitative research can define the variables of interest and provide preliminary insight into which variable interact and help define the hypothesis, but it requires quantitative research to clearly understand how those variables interact.
It is one thing to declare confidently that causal chains exist in the world out there. However, it is quite another thing to find out what they are. Causal processes are not obvious. They hide in situations of complexity, in which effects may have been produced by several different causes acting together. When investigated, they will reluctantly shed one layer of explanation at a time, but only to reveal another deeper level of complexity beneath.
(Marsh and Elliott, 2008, p. 239)
As its strongest point, quantitative data analysis gets at the deep structure of the data. High-quality quantitative data analysis exposes a deep structure and researchers should never be content with the superficial structure that appears at first glance. And certainly should not to be content with poor/inadequate data analysis where the analysis process is seen as running a few statistics tests, reporting the p-value, and calling that a data analysis.
It is true that no social science study will be able to obtain the clear results with the hard numbers obtained in the physical sciences. The cause and effect relationships of the laws in the physical sciences do not exist. Instead, when people enter into the research equations, there are at best probabilistic relationships. Nothing can be clearly predicted. But to simply refuse to undertake a study because of too many interactions is poor research, as is deciding to only undertake qualitative research. Qualitative research can build up the big picture and show the existence of relationships. As a result, it reveals the areas where we can best apply quantitative approaches. It is with quantitative approaches that we can fully get at the underlying relationships within the data and, in the end, it is those relationships that contain the deep understanding of the overall complexities of the situation (Albers, 2010). Developing that deep understanding is the fundamental goal of a research agenda.
Every quantitative research study collects some type of data that gets reduced to numbers and must be analyzed to help draw the study's conclusions. A great study design is useless unless the data are properly analyzed. What I have found is that most textbooks fall into one of these categories.
Research method textbooks explain how to create and execute a study, but typically are very light on how to analyze the data. They are excellent on explaining methods of setting up the study and collecting the data, but not on the methods to analyze data after it has been collected.
