Introducing Statistics - Eileen Magnello - E-Book

Introducing Statistics E-Book

Eileen Magnello

0,0

Beschreibung

From the medicine we take, the treatments we receive, the aptitude and psychometric tests given by employers, the cars we drive, the clothes we wear to even the beer we drink, statistics have given shape to the world we inhabit. For the media, statistics are routinely 'damning', 'horrifying', or, occasionally, 'encouraging'. Yet, for all their ubiquity, most of us really don't know what to make of statistics. Exploring the history, mathematics, philosophy and practical use of statistics, Eileen Magnello - accompanied by Bill Mayblin's intelligent graphic illustration - traces the rise of statistics from the ancient Babylonians, Egyptians and Chinese, to the censuses of Romans and the Greeks, and the modern emergence of the term itself in Europe. She explores the 'vital statistics' of, in particular, William Farr, and the mathematical statistics of Karl Pearson and R.A. Fisher.She even tells how knowledge of statistics can prolong one's life, as it did for evolutionary biologist Stephen Jay Gould, given eight months to live after a cancer diagnoses in 1982 - and he lived until 2002. This title offers an enjoyable, surprise-filled tour through a subject that is both fascinating and crucial to understanding our world.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 116

Das E-Book (TTS) können Sie hören im Abo „Legimi Premium” in Legimi-Apps auf:

Android
iOS
Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Published by Icon Books Ltd., Omnibus Business Centre, 39–41 North Road, London N7 9DP email: [email protected]

ISBN: 978-184831-773-4

Text copyright © 2009 Eileen Magnello Illustrations copyright © 2009 Icon Books Ltd

The author and artist have asserted their moral rights.

Originating editor: Duncan Heath

No part of this book may be reproduced in any form, or by any means, without prior permission in writing from the publisher.

Contents

Cover

Title Page

Copyright

Drowning by Numbers

Averages or Variation?

Why Study Statistics?

What are Statistics?

What Does Statistics Mean?

Vital Statistics vs. Mathematical Statistics

The Philosophy of Statistics

Darwin and Statistical Populations

Victorian Values

Where Did it All Begin?

Parish Registers

The London Bills of Mortality

Halley’s Mortality Tables

Malthusian Populations

Demography – the Science of Populations

The Statistical Society of London

Edwin Chadwick and Sanitary Reforms

William Farr and Vital Statistics

Florence Nightingale: the Passionate Statistician

The Statistics of the Crimean War

Mortality Statistics in the Crimea

Polar Area Graphs

Probability

Variables

Games of Chance

De Moivre and Gambling in Soho

The Mathematical Theory of Probability

Relative Frequency

The Bayesian Approach

Probability Distributions

The Poisson Distribution

The Normal Distribution

Astronomical Observations

The Central Limit Theorem

The Gaussian Curve and the Principle of Least Squares

What’s Normal?

The Naming of the Normal

So What is the Normal Distribution?

Quetelismus

Galton’s Pantograph

How to Summarise the Data?

Quetelet and the Arithmetical Mean

The Mean

The Median

How to Locate or Calculate the Median

Does it Matter Which Statistical Average is Used?

Misleading With Statistics

Data Management Procedures

Standardized Frequency Distributions

Samples vs. Populations

The Histogram

Frequency Distributions

The Method of Moments

Natural Selection: the Changing Shapes of Darwinian Distributions

The Peppered Moth

The Pearsonian Family of Curves

The Interquartile Range

The Standard Deviation

Coefficient of Variation

Comparing Variation of Variables

Practical Applications

Pearson’s Scales of Measurement

Nominal and Ordinal Variables

Ratio and Interval

Early Uses of Correlation

Causation and Spurious Correlation

Path Analysis and Causation

Scatter Diagrams

Weldon and Negative Correlation

Curvilinear Relationships

Galton and Biological Regression

Regression to the Mean

Galton’s Two Regression Lines

George Udny Yule and the Method of Least Squares

Correlation vs. Regression

Galton’s Dilemma

Pearson’s Product-Moment Correlation

Simple Correlation and Multiple Correlation

Statistical Control

Discrete 2 × 2 Relationships

Biserial Correlations

Egon Pearson and Polychoric Correlations

Factor Analysis

Maurice Kendall’s Tau Coefficient

Correlation vs. Association

Curve-Fitting for Asymmetrical Distributions

Interpreting Results with Degrees of Freedom

The Chi-Square Probability Table

A Statistical Test for the Guinness Brewery

Quantifying Brewery Material

Agricultural Variation

Small Samples vs. Large Samples

Testing Statistical Differences Between Two Means

Statistical Results for Guinness

Student’s t-test

A New Statistical Era: Rothamsted’s Broadbalk Agricultural Data

Fisher’s Statistical Analysis of Variance

The Analysis of Agricultural Variation

The Analysis of Variance and Small Samples

Inferential Statistics

The Sampling Distribution

Conclusion

Bibliography

About the Author

Index

Drowning by Numbers

We are drowning in statistics. And they are not just numbers. For the media, statistics are routinely “damning", “horrifying”, “deadly”, “troublesome” – or, on occasion, “encouraging”. The press constantly suggest that statistical information about crime, disease, poverty and transport delays is not only the source of the problem, but that it represents real entities or real people instead of one point on a graph.

This tendency to assign meaning to a single essence or example by looking at one point on a statistical distribution creates unnecessary confusion and fear.

Averages or Variation?

Much of the shock-horror statistical information used by the media is based on statistical averages. Despite the often misleading preoccupation with averages, the most important statistical concept neglected by journalists and news reporters is variation. This concept is essential to modern mathematical statistics and plays a pivotal role in biological, medical, educational and industrial statistics.

So why is variation important?

Variation measures individual differences, while averages are concerned with summarising this information into one exemplar.

Variation can be quite easily seen in multicultural Britain, and especially London, which now consists of more than 300 sub-cultures with as many languages spoken (from Acholi to Zulu) and thirteen different faiths. For some, multiculturalism is about valuing everybody and not making everyone the same (or not reducing this ethnically diverse group of individuals to one representative person).

There are so many individual differences across the British population that it is now practically meaningless to talk about the ‘average’ British person, as one might have done before 1950.

These multifarious individual differences embody the statistical variation that is the crux of modern mathematical statistics.

Why Study Statistics?

Statistics are used by scientists, economists, government officials, industry and manufacturers. Statistical decisions are made constantly and affect our daily lives – from the medicine we take, the treatments we receive, the aptitude and psychometric tests employers give routinely, the cars we drive, the clothes we wear (wool manufacturers use statistical tests to determine the thread weave for our comfort) to the food we eat and even the beer we drink.

Statistics are an inescapable part of our lives.

Knowledge of some basic statistics can even save or extend lives – as it did for Stephen Jay Gould, whom we will hear more about later.

What are Statistics?

Yet for all their ubiquity, we don’t really know what to make of statistics. As one columnist put it, “cigarettes are the biggest single cause of statistics”. People express a wish to avoid bad things by saying, “I don’t want to be another statistic”. Do statisticians really think that all of humanity is reducible to a few numbers?

Although some people think that statistical results are irrefutable, others believe that all statistical information is deceptive.

My famous dictum, “Lies, Damned Lies and Statistics”, is often invoked to “prove” that statistics are quite often deliberately misleading. Lies… Dammed lies…

Though Twain mistakenly attributed this aphorism to Prime Minister Benjamin Disraeli in 1904, Leonard Henry Courtney had first used the phrase in a speech in Saratoga Springs, New York in 1895, concerning proportional representation of the 44 American states.

Some government officials even blame statistics for causing economic problems. When White House press secretary Scott McClellan tried to explain in February 2004 why the Bush administration reneged on a forecast that should have led to more jobs in America, his defence was simple.

The President is not a statistician. As though a statistician would have been able to provide jobs for the unemployed in the United States.

In Britain, the Statistics Commission called for “Cabinet Ministers to be banned from examining statistical information before it is made public, as this would avoid political influence or exploitation”. Nevertheless, the statistics that are available for public consumption can shape public opinions, influence government policies and inform (or misinform) citizens of medical and scientific discoveries and breakthroughs.

What Does Statistics Mean?

The word “statistics” is derived from the Latin status, which led to the Italian word statista, first used in the 16th century, referring to a statist or statesman – someone concerned with matters of the state. The Germans used Statistik around 1750, the French introduced statistique in 1785 and the Dutch adopted statistiek in 1807.

Early statistics was a quantitative system for describing matters of state – a form of “political arithmetic”.

The system was first used in 17th-century England by the London merchant John Graunt (1620–74) and the Irish natural philosopher William Petty (1623–87).

In the 18th century, many statists were jurists; their background was often in public law (the branch of law concerned with the state itself).

It was the Scottish landowner and first president of the Board of Agriculture, Sir John Sinclair (1754–1834), who introduced the word “statistics” into the English language in 1798 in his Statistical Account of Scotland.

I wanted to measure the “quantum of happiness” of the Scots. The What?

Sinclair used statistics for social phenomena rather than for political matters. This led eventually to the development of vital statistics in the mid-19th century.

Vital Statistics vs. Mathematical Statistics

Not all statistics are the same. There are two types: vital statistics and mathematical statistics.

Vital statistics is what most people understand by statistics. It is used as a plural noun and refers to an aggregate set of data.

It refers to the description and enumeration used in census counts or in the tabulation of official statistics such as marriage, divorce and crime statistics. We also have insurance statistics and even cricket and baseball statistics.

This process is primarily concerned with average values, and uses life tables, percentages, proportions and ratios: probability is most commonly used for actuarial (i.e. life-insurance) purposes. It was not until the 20th century that the singular form “statistic”, signifying an individual fact, came into use.

Mathematical statistics is used as a singular noun, and it arose out of the mathematical theory of probability in the late 18th century from the work of such continental mathematicians as Jacob Bernoulli, Abraham DeMoivre, Pierre-Simon Laplace and Carl Friedrich Gauss.

In the late 19th century, mathematical statistics began to take shape as a fully-fledged academic discipline in the work of Francis Ysidro Edgeworth (1845–1926), John Venn (1834–1923), Francis Galton (1822–1911), W.F.R. Weldon (1860–1906) and Karl Pearson (1857–1936).

We three began to apply Charles Darwin’s ideas to the measurement of biological variation, which required a new statistical methodology.

Mathematical statistics encompasses a scientific discipline that analyses variation, and is often underpinned by matrix algebra. It deals with the collection, classification, description and interpretation of data from social surveys, scientific experiments and clinical trials. Probability is used for statistical tests of significance.

Mathematical statistics is analytical and can be used to make statistical predictions or inferences about the population. Furthermore, it capitalizes on all the individual differences in a group by examining the spread of this statistical variation through such methods as the range and standard deviation, which we’ll look at later. Vital statistics is concerned with averages, whereas mathematical statistics deals with variation.

Used in this sense, statistics is a technical discipline, and while it is mathematical, it is essential to understand the statistical concepts underlying the mathematical procedures.

The Philosophy of Statistics

The decision to examine averages or to measure variation is rooted in philosophical ideologies that governed the thinking of statisticians, natural philosophers and scientists throughout the 19th century. The emphasis on statistical averages was underpinned by the philosophical tenets of determinism and typological ideas of biological species, which helped to perpetuate the idea of an idealized mean.

Determinism implies that there is order and perfection in the universe …

Thus, variation is flawed, a source of error that should be eradicated, since it interferes with God’s plan and purpose for His world.

The typological concept of species, which was the dominant thinking of taxonomists,* typologists and morphologists until the end of the 19th century, gave rise to the morphological concept of species. Species were thought to have represented an ideal type.

The presence of an ideal type was inferred from some sort of morphological similarity, which became the species criterion for typologists. This could have had the effect of creating a proliferation of species since any deviation from the type would have led to the classification of a new species.

Genuine change, according to the morphological concept of species was possible only through the saltational origins of new species, meaning that new species should have occurred by leaps or jumps in a single generation. Because Darwin’s theory of evolution depended upon “gradual” changes, it was incompatible with essentialism.

*Taxonomists classify organisms into groupsTypologists classify organisms according to general typesMorphologists study the forms of organisms

Darwin and Statistical Populations

The transition to measuring statistical variation represented an ideological shift that occurred during the middle of the 19th century, when Charles Darwin (1809–82) began to study minute biological variation in plants and animals.

When I suggested in 1859 that evolution proceeded by the accumulation of minute differences between individuals, I introduced the idea of continuous variation into biological thinking.

Every idea of Darwin, from variation, natural selection, inheritance to reversion, seemed to demand statistical analyses.

Darwin had not only shown that variation was measurable and meaningful by emphasizing statistical populations rather than focusing on one type or essence, but he also discussed various types of correlation that could be used to explain natural selection. As the evolutionary biologist Sewall Wright (1899–1988) remarked in 1931:

Darwin was the first person to effectively view evolution as primarily a statistical process.

Victorian Values

Although some developments in vital and mathematical statistics took place on the Continent, we owe the rapid growth and application of vital statistics in the mid-19th century and mathematical statistics in the late 19th and early 20th centuries to these Victorians:

The development of both types of statistics took place in the wider context of the Victorian culture of measurement. The Victorians valued the precision and accuracy that instruments provided because it gave them more reliable information. In the expanding industrial economy, it was essential to establish that the results were reproducible for an international market.

Engineers and physicists spent long hours in laboratories recording and measuring electrical, mechanical and physical constants for machines, apparatus and other objects. Biologists and geologists collected as much information as possible on expeditions to create geographical maps, measure longitude and latitude, and classify new species of plants and animals.

Statistics offered one way to quantify human measurements especially for matters dealing with public health and hygiene, epidemics, heredity and medicine.

Where Did it All Begin?