111,99 €
A concise, easily accessible introduction to descriptive and inferential techniques Statistical Inference: A Short Course offers a concise presentation of the essentials of basic statistics for readers seeking to acquire a working knowledge of statistical concepts, measures, and procedures. The author conducts tests on the assumption of randomness and normality, provides nonparametric methods when parametric approaches might not work. The book also explores how to determine a confidence interval for a population median while also providing coverage of ratio estimation, randomness, and causality. To ensure a thorough understanding of all key concepts, Statistical Inference provides numerous examples and solutions along with complete and precise answers to many fundamental questions, including: * How do we determine that a given dataset is actually a random sample? * With what level of precision and reliability can a population sample be estimated? * How are probabilities determined and are they the same thing as odds? * How can we predict the level of one variable from that of another? * What is the strength of the relationship between two variables? The book is organized to present fundamental statistical concepts first, with later chapters exploring more advanced topics and additional statistical tests such as Distributional Hypotheses, Multinomial Chi-Square Statistics, and the Chi-Square Distribution. Each chapter includes appendices and exercises, allowing readers to test their comprehension of the presented material. Statistical Inference: A Short Course is an excellent book for courses on probability, mathematical statistics, and statistical inference at the upper-undergraduate and graduate levels. The book also serves as a valuable reference for researchers and practitioners who would like to develop further insights into essential statistical tools.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 507
Veröffentlichungsjahr: 2012
Contents
Cover
Title Page
Copyright
Dedication
Preface
Chapter 1: The Nature of Statistics
1.1 Statistics Defined
1.2 The Population and the Sample
1.3 Selecting a Sample from a Population
1.4 Measurement Scales
1.5 Let Us Add
Exercises
Chapter 2: Analyzing Quantitative Data
2.1 Imposing Order
2.2 Tabular and Graphical Techniques: Ungrouped Data
2.3 Tabular and Graphical Techniques: Grouped Data
Exercises
2.4 Appendix 2.A Histograms with Classes of Different Lengths
Chapter 3: Descriptive Characteristics of Quantitative Data
3.1 The Search for Summary Characteristics
3.2 The Arithmetic Mean
3.3 The Median
3.4 The Mode
3.5 The Range
3.6 The Standard Deviation
3.7 Relative Variation
3.8 Skewness
3.9 Quantiles
3.10 Kurtosis
3.11 Detection of Outliers
3.12 So What do We do with All this Stuff?
Exercises
Appendix 3.A Descriptive Characteristics of Grouped Data
Chapter 4: Essentials of Probability
4.1 Set Notation
4.2 Events within the Sample Space
4.3 Basic Probability Calculations
4.4 Joint, Marginal, and Conditional Probability
4.5 Sources of Probabilities
Exercises
Chapter 5: Discrete Probability Distributions And Their Properties
5.1 The Discrete Probability Distribution
5.2 The Mean, Variance, and Standard Deviation of A Discrete Random Variable
5.3 The Binomial Probability Distribution
Exercises
Chapter 6: The Normal Distribution
6.1 The Continuous Probability Distribution
6.2 The Normal Distribution
6.3 Probability as An Area Under The Normal Curve
6.4 Percentiles of The Standard Normal Distribution and Percentiles of The Random Variable X
Exercises
Appendix 6.A The Normal Approximation to Binomial Probabilities
Chapter 7: Simple Random Sampling and the Sampling Distribution of the Mean
7.1 Simple Random Sampling
7.2 The Sampling Distribution of The Mean
7.3 Comments on the Sampling Distribution of the Mean
7.4 A Central Limit Theorem
Exercises
Appendix 7.A Using a Table of Random Numbers
Appendix 7.B Assessing Normality Via the Normal probability Plot
Appendix 7.C Randomness, Risk, and Uncertainty
Chapter 8: Confidence Interval Estimation Of μ
8.1 The Error Bound On As An Estimator Of
8.2 A Confidence Interval For The Population Mean μ (σ Known)
8.3 A Sample Size Requirements Formula
8.4 A Confidence Interval For The Population Mean μ (σ Unknown)
Exercises
Appendix 8.A A Confidence Interval for the Population Median MED
Chapter 9: The Sampling Distribution of a Proportion and Its Confidence Interval Estimation
9.1 The Sampling Distribution of a Proportion
9.2 The Error Bound on as an Estimator for p
9.3 A Confidence Interval for the Population Proportion (of Successes) p
9.4 A Sample Size Requirements Formula
Exercises
Appendix 9.A Ratio Estimation
Chapter 10: Testing Statistical Hypotheses
10.1 What is a Statistical Hypothesis?
10.2 Errors in Testing
10.3 The Contextual Framework of Hypothesis Testing
10.4 Selecting A Test Statistic
10.5 The Classical Approach to Hypothesis Testing
10.6 Types of Hypothesis Tests
10.7 Hypothesis Tests for μ (σ Known)
10.8 Hypothesis Tests for μ (σ Unknown And n Small)
10.9 Reporting The Results of Statistical Hypothesis Tests
10.10 Hypothesis Tests for The Population Proportion (of Successes) p
Exercises
Appendix 10.A Assessing The Randomness of A Sample
Appendix 10.B Wilcoxon Signed Rank Test (of a Median)
Appendix 10.C Lilliefors Goodness-of-Fit Test for Normality
Chapter 11: Comparing Two Population Means and Two Population Proportions
11.1 Confidence Intervals for the Difference of Means when Sampling from Two Independent Normal Populations
11.2 Confidence Intervals for the Difference of Means When Sampling from Two Dependent Populations: Paired Comparisons
11.3 Confidence Intervals for the Difference of Proportions When Sampling from Two Independent Binomial Populations
11.4 Statistical Hypothesis Tests for the Difference of Means When Sampling from Two Independent Normal Populations
11.5 Hypothesis Tests for the Difference of Means When Sampling From Two Dependent Populations: Paired Comparisons
11.6 Hypothesis Tests for the Difference of Proportions when Sampling from Two Independent Binomial Populations
Exercises
Appendix 11.A Runs Test for Two Independent Samples
Appendix 11.B Mann–Whitney (Rank Sum) Test for Two Independent Populations
Appendix 11.C Wilcoxon Signed Rank Test When Sampling from Two Dependent Populations: Paired Comparisons
Chapter 12: Bivariate Regression and Correlation
12.1 Introducing an Additional Dimension to our Statistical Analysis
12.2 Linear Relationships
12.3 Estimating the Slope and Intercept of the Population Regression Line
12.4 Decomposition of the Sample Variation in Y
12.5 Mean, Variance, and Sampling Distribution of the Least Squares Estimators and
12.6 Confidence Intervals for and
12.7 Testing Hypotheses about and
12.8 Predicting the Average Value of Y given X
12.9 The Prediction of a Particular Value of Y given X
12.10 Correlation Analysis
Exercises
Appendix 12.A Assessing Normality (Appendix 7.B Continued)
Appendix 12.B On Making Causal Inferences
Chapter 13: An Assortment of Additional Statistical Tests
13.1 Distributional Hypotheses
13.2 The Multinomial Chi-Square Statistic
13.3 The Chi-Square Distribution
13.4 Testing Goodness Of Fit
13.5 Testing Independence
13.6 Testing k Proportions
13.7 A Measure of Strength of Association in a Contingency Table
13.8 A Confidence Interval for σ2 Under Random Sampling from a Normal Population
13.9 The F Distribution
13.10 Applications of the F Statistic to Regression Analysis
Exercises
Appendix A
Solutions to Exercises
References
Index
Copyright © 2012 by John Wiley & Sons, Inc. All rights reserved
Published by John Wiley & Sons, Inc., Hoboken, New Jersey
Published simultaneously in Canada
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.
Library of Congress Cataloging-in-Publication Data:
Panik, Michael J.
Statistical inference : a short course / Michael J. Panik.
p. cm.
Includes index.
ISBN 978-1-118-22940-8 (cloth)
1. Mathematical statistics--Testbooks. I. Title.
QA276.12.P36 2011
519.5--dc23
2011047632
ISBN: 9781118229408
To the memory ofRichard S. Martin
Preface
Statistical Inference: A Short Course is a condensed and to-the-point presentation of the essentials of basic statistics for those seeking to acquire a working knowledge of statistical concepts, measures, and procedures. While most individuals will not be performing high-powered statistical analyses in their work or professional environments, they will be, on numerous occasions, reading technical reports, reviewing a consultant's findings, perusing through academic, trade, and professional publications in their field, and digesting the contents of diverse magazine/newspaper articles (online or otherwise) wherein facts and figures are offered for appraisal. Let us face it—there is no escape. We are a society that generates a virtual avalanche of information on a daily basis.
That said, correctly understanding notions such as: a research hypothesis, statistical significance, randomness, central tendency, variability, reliability, cause and effect, and so on are of paramount importance when it comes to being an informed consumer of statistical results. Answers to questions such as:
and so on, will be offered and explained.
Statistical Inference: A Short Course is general in nature and is appropriate for undergraduates majoring in the natural sciences, the social sciences, or in business. It can also be used in first-year graduate courses in these areas. This text offers what can be considered as “just enough” material for a one-semester course without overwhelming the student with “too fast a pace” or “too many” topics. The essentials of the course appear in the main body of the chapters and interesting “extras” (some might call them “essentials”) are found in the chapter appendices and chapter exercises. While Chapters 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 are fundamental to any basic statistics course, the instructor can “pick and choose” items from Chapters 11, 12, 13. This latter set of chapters is optional and the topics therein can be selected with an eye toward student interest and need.
This text is highly readable, presumes only a knowledge of high school algebra, and maintains a high degree of rigor and statistical as well as mathematical integrity in the presentation. Precise and complete definitions of key concepts are offered throughout and numerous example problems appear in each chapter. Solutions to all the exercises are provided, with the exercises themselves designed to test the student's mastery of the material rather than to entertain the instructor.
While all beginning statistics texts discuss the concepts of simple random sampling and normality, this book takes such discussions a bit further. Specifically, a couple of the key assumptions typically made in the areas of estimation and testing are that we have a “random sample” of observations drawn from a “normal population.” However, given a particular data set, how can we determine if it actually constitutes a random sample and, secondly, how can we determine if the parent population can be taken to be normal? That is, can we proceed “as if” the sample is random? And can we operate “as if” the population is normal? Answers to these questions will be provided by a couple of formal test procedures for randomness and for the assessment of normality. Other topics not usually found in introductory texts include determining a confidence interval for a population median, ratio estimation (a technique akin to estimating a population proportion), general discussions of randomness and causality, and some nonparametric methods that serve as an alternative to parametric routines when the latter are not strictly applicable. As stated earlier, the instructor can pick and choose from among them or decide to bypass them altogether.
Looking to specifics:
While the bulk of this text was developed from class notes used in courses offered at the University of Hartford, West Hartford, CT, the final draft of the manuscript was written while the author was Visiting Professor of Mathematics at Trinity College, Hartford, CT. Sincere thanks go to my colleagues Bharat Kolluri, Rao Singamsetti, Frank DelloIacono, and Jim Peta at the University of Hartford for their support and encouragement and to David Cruz-Uribe and Mary Sandoval of Trinity College for the opportunity to teach and to participate in the activities of the Mathematics Department.
A special note of thanks goes to Alice Schoenrock for her steadfast typing of the various iterations of the manuscript and for monitoring the activities involved in obtaining a complete draft of the same. I am also grateful to Mustafa Atalay for drawing most of the illustrations and for sharing his technical expertise in graphical design.
An additional offering of appreciation goes to Susanne Steitz-Filler, Editor, Mathematics and Statistics, at John Wiley & Sons for her professionalism and vision concerning this project.
Michael J. Panik
Windsor, CT
Chapter 1
The Nature of Statistics
Broadly defined, statistics involves the theory and methods of
data so as to determine their essential characteristics. While some discussion will be devoted to the collection, organization, and presentation of data, we shall, for the most part, concentrate on the analysis of data and the interpretation of the results of our analysis.
How should the notion of data be viewed? It can be thought of as simply consisting of “information” that can take a variety of forms. For example, data can be numerical (test scores, weights, lengths, elapsed time in minutes, etc.) or non-numerical (such as an attribute involving color or texture or a category depicting the sex of an individual or their political affiliation, if any, etc.) (See Section 1.4 of this chapter for a more detailed discussion of data forms or varieties.)
Two major types of statistics will be recognized: (1) descriptive; and (2) inductive1 or inferential.
In sum, if we want to only summarize or present data or just catalog facts then descriptive techniques are called for. But if we want to make inferences about the entire data set on the basis of sample information or, more generally, make decisions in the face of uncertainty then the use of inductive or inferential techniques is warranted.
The concept of the “entire data set” alluded to above will be called the population; it is the group to be studied. (Remember that “population” does not refer exclusively to “people;” it can be a group of states, countries, cities, registered democrats, cars in a parking lot, students at a particular academic institution, and so on.) We shall let N denote the population size or the number of elements in the population.
Each separate characteristic of an element in the population will be represented by a variable (usually denoted as X). We may think of a variable as describing any qualitative or quantitative aspect of a member of the population. A qualitative variable has values that are only “observed.” Here a characteristic pertains to some attribute (such as color) or category (male or female). A quantitative variable will be classified as either discrete (it takes on a finite or countable number of values) or continuous (it assumes an infinite or uncountable number of values). Hence, discrete values are “counted;” continuous values are “measured.” For instance, a discrete variable might be the number of blue cars in a parking lot, the number of shoppers passing through a supermarket check-out counter over a 15 min time interval, or the number of sophomores in a college-level statistics course. A continuous variable can describe weight, length, the amount of water passing through a culvert during a thunderstorm, elapsed time in a race, and so on.
While a population can consist of all conceivable observations on some variable X, we may view a sample as a subset of the population. The sample size will be denoted as n, with n < N. It was mentioned above that, in order to make a legitimate inference about a population, a representative sample was needed. Think of a representative sample as a “typical” sample—it should adequately reflect the attributes or characteristics of the population.
While there are many different ways of constructing a sampling plan, our attention will be focused on the notion of simple random sampling. Specifically, a sample of size n drawn from a population of size N is obtained via simple random sampling if every possible sample of size n has an equal chance of being selected. A sample obtained in this fashion is then termed a simple random sample; each element in the population has the same chance of being included in a simple random sample.
Before any sampling is actually undertaken, a list of items (called the sampling frame) in the population is formed and thus serves as the formal source of the sample, with the individual items listed on the frame termed elementary sampling units. So, given the sampling frame, the actual process of random sample selection will be accomplished without replacement, that is, once an item from the population has been selected for inclusion in the sample, it is not eligible for selection again—it is not returned to the population pool (it is, so to speak, “crossed off” the frame) and consequently cannot be chosen, say, a second time as the simple random sampling process commences. (Under sampling with replacement, the item chosen is returned to the population before the next selection is made.)
Will the process of random sampling guarantee that a representative sample will be acquired? The answer is, “probably.” That is, while randomization does not absolutely guarantee representativeness (since random sampling gives the same chance of selection to every sample—representative ones as well as nonrepresentative ones), we are highly likely but not certain to get a representative sample. Then why all the fuss about random sampling? The answer to this question hinges upon the fact that it is possible to make erroneous inferences from sample data. (After all, we are not examining the entire population.) Under simple random sampling, we can validly apply the rules of probability theory to calculate the chances or magnitudes of such errors; and their rates enable us to assess the reliability of, or form a degree of confidence in, our inferences about the population.
Let us recognize two basic types of errors that can creep into our data analysis. The first is sampling error, which is reflective of the inherent natural variation between samples (since different samples possess different sample values); it arises because sampling gives incomplete information about a population. This type of error is inescapable—it is always present. If one engages in sampling then sampling error is a fact of life. The other variety of error is nonsampling error—human or mechanical factors tend to distort the observed values. Nonsampling error can be controlled since it arises essentially from unsound experimental techniques or from obtaining and recording information. Examples of nonsampling error can range from using poorly calibrated or inadequate measuring devices to inaccurate responses (or nonresponses) to questions on a survey form. In fact, even poorly worded questions can lead to such errors. And if preference is given to selecting some observations over others so that, for example, the underrepresentation of some group of individuals or items occurs, then a biased sample results.
We previously referred to data2 as “information,” that is, as a collection of facts, values, or observations. Suppose then that our data set consists of observations that can be “measured” (e.g., classified, ordered, or quantified). At what level does the measurement take place? In particular, what are the “forms” in which data are found or the “scales” on which data are measured? These scales, offered in terms of increasing information content, are classified as nominal, ordinal, interval, and ratio.
Both interval and ratio scales are said to be metric scales since differences between values measured on these scales are meaningful; and variables measured on these scales are said to be quantitative variables.
It should be evident from the preceding discussion that any variable measured on one scale automatically satisfies all the properties of a less informative scale.
Example 1.1
Suppose our objective is to study the residential housing stock of a particular city. Suppose further that our inquiry is to be limited to one- and two-family dwellings. These categories of dwellings make up the target population—the population about which information is desired. How do we obtain data on these housing units? Should we simply stroll around the city looking for one- and two-family housing units? Obviously this would be grossly inefficient. Instead, we will consult the City Directory. This directory is the sampled population (or sampling frame)—the population from which the sample is actually obtained. Now, if the City Directory is kept up to date then we have a valid sample—the target and sampled populations have similar characteristics.
Let the individual residences constitute the elementary sampling units, with an observation taken to be a particular data point or value of some characteristic of interest. We shall let each such characteristic be represented by a separate variable, and the value of the variable is an observation of the characteristic.
Assuming that we have settled on a way of actually extracting a random sample from the directory, suppose that one of the elementary sampling units is the residence located at 401 Elm Street. Let us consider some of its important characteristics (Table 1.1).
Note that X1 and X2 are qualitative or nonmetric variables measured on a nominal scale while variables X3, . . ., X8 are quantitative or metric variables measured on a ratio scale.
Table 1.1 Characteristics of the Residence at 401 Elm Street.
CharacteristicVariableObservation ValuesNumber of familiesX1One family (1) or two family (2)Attached garageX2Yes (1) or no (0)Number of roomsX36Number of bathroomsX42Square feet of living spaceX52100Assessed valueX6$230,500Year constructedX71987Lot size (square feet)X82400Quite often throughout this text the reader will be asked to total or form the sum of the values appearing in a variety of data sets. We have a special notation for the operation of addition. We will let the Greek capital sigma or Σ serve as our “summation sign.” Specifically, for a variable X with values X1, X2, . . ., Xn,
(1.1)
Here the right-hand side of this expression reads “the sum of all observations Xi as i goes from 1 to n.” In this regard, Σ is termed an operator—it operates only on those items having an i index, and the operation is addition. When it is to be understood that we are to add over all i values, then Equation (1.1) can be rewritten simply as .
Notes
1.Induction is a process of reasoning from the specific to the general.
2. “Data” is a plural noun; “datum” is the singular of data.
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
