Practical Statistics for Field Biology - Jim Fowler - E-Book

Practical Statistics for Field Biology E-Book

Jim Fowler

0,0
32,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Practical Statistics for Field Biology, 2nd Edition

Provides an excellent introductory text for students on the principles and methods of statistical analysis in the life sciences, helping them choose and analyse statistical tests for their own problems and present their findings.

An understanding of statistical principles and methods is essential for any scientist but is particularly important for those in the life sciences. The field biologist faces very particular problems and challenges with statistics as "real-life" situations such as collecting insects with a sweep net or counting seagulls on a cliff face can hardly be expected to be as reliable or controllable as a laboratory-based experiment. Acknowledging the peculiarites of field-based data and its interpretation, this book provides a superb introduction to statistical analysis helping students relate to their particular and often diverse data with confidence and ease.

To enhance the usefulness of this book, the new edition incorporates the more advanced method of multivariate analysis, introducing the nature of multivariate problems and describing the the techniques of principal components analysis, cluster analysis and discriminant analysis which are all applied to biological examples. An appendix detailing the statistical computing packages available has also been included.

It will be extremely useful to undergraduates studying ecology, biology, and earth and environmental sciences and of interest to postgraduates who are not familiar with the application of multiavirate techniques and practising field biologists working in these areas.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 376

Veröffentlichungsjahr: 2013

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Contents

PREFACE

1 INTRODUCTION

1.1 What do we mean by statistics?

1.2 Why is statistics necessary?

1.3 Statistics in field biology

1.4 The limitations of statistics

1.5 The purpose of this text

2 MEASUREMENT AND SAMPLING CONCEPTS

2.1 Populations, samples and observations

2.2 Counting things – the sampling unit

2.3 Random sampling

2.4 Random numbers

2.5 Independence

2.6 Statistics and parameters

2.7 Descriptive and inferential statistics

2.8 Parametric and non-parametric statistics

3 PROCESSING DATA

3.1 Scales of measurement

3.2 The nominal scale

3.3 The ordinal scale

3.4 The interval scale

3.5 The ratio scale

3.6 Conversion of interval observations to an ordinal scale

3.7 Derived variables

3.8 The precision of observations

3.9 How precise should we be?

3.10 The frequency table

3.11 Aggregating frequency classes

3.12 Frequency distribution of count observations

3.13 Dispersion

3.14 Bivariate data

4 PRESENTING DATA

4.1 Introduction

4.2 Dot plot or line plot

4.3 Bar graph

4.4 Histogram

4.5 Frequency polygon and frequency curve

4.6 Scattergram (scatter plot)

4.7 Circle or pie graph

5 MEASURING THE AVERAGE

5.1 What is an average?

5.2 The mean

5.3 The median – a resistant statistic

5.4 The mode

5.5 Relationship between the mean, median and mode

6 MEASURING VARIABILITY

6.1 Variability

6.2 The range

6.3 The standard deviation

6.4 Calculating the standard deviation

6.5 Calculating the standard deviation from grouped data

6.6 Variance

6.7 An alternative formula for calculating the variance and standard deviation

6.8 Obtaining the standard deviation, variance and the sum of squares from a calculator

6.9 Degrees of freedom

6.10 The coefficient of variation (CV)

7 PROBABILITY

7.1 The meaning of probability

7.2 Compound probabilities

7.3 Probability distribution

7.4 Models of probability distribution

7.5 The binomial probability distribution

7.6 The Poisson probability distribution

7.7 The negative binomial probability distribution

7.8 Critical probability

8 PROBABILITY DISTRIBUTIONS AS MODELS OF DISPERSION

8.1 Dispersion

8.2 An Index of Dispersion

8.3 Choosing a model of dispersion

8.4 The binomial model

8.5 Poisson model

8.6 The negative binomial model

8.7 Deciding the goodness of fit

9 THE NORMAL DISTRIBUTION

9.1 The normal curve

9.2 Some mathematical properties of the normal curve

9.3 Standardizing the normal curve

9.4 Two-tailed or one-tailed?

9.5 Small samples: the t-distribution

9.6 Are our data ‘normal’?

10 DATA TRANSFORMATION

10.1 The need for transformation

10.2 The logarithmic transformation

10.3 When there are zero counts – the arcsinh transformation

10.4 The square root transformation

10.5 The arcsine transformation

10.6 Back-transforming transformed numbers

10.7 Is data transformation really necessary?

11 HOW GOOD ARE OUR ESTIMATES?

11.1 Sampling error

11.2 The distribution of a sample mean

11.3 The confidence interval of the mean of a large sample

11.4 The confidence interval of the mean of a small sample

11.5 The confidence interval of the mean of a sample of count data

11.6 The difference between the means of two large samples

11.7 The difference between the means of two small samples

11.8 Estimating a proportion

11.9 Estimating a Lincoln Index

11.10 Estimating a diversity index

11.11 The distribution of a variance – chi-square distribution

12 THE BASIS OF STATISTICAL TESTING

12.1 Introduction

12.2 The experimental hypothesis

12.3 The statistical hypothesis

12.4 Test statistics

12.5 One-tailed tests and two-tailed tests

12.6 Hypothesis testing and the normal curve

12.7 Type 1 and type 2 errors

12.8 Parametric and non-parametric statistics: some further observations

12.9 The power of a test

13 ANALYSING FREQUENCIES

13.1 The chi-square test

13.2 Calculating the x2 test statistic

13.3 A practical example of a test for homogeneous frequencies

13.4 The problem of independence

13.5 One degree of freedom – Yates’ correction

13.6 Goodness of fit tests

13.7 Tests for association –the contingency table

13.8 The r × c contingency table

13.9 The G-test

13.10 Applying the G-test to a one-way classification of frequencies

13.11 Applying the G-test to a 2 × 2 contingency table

13.12 Applying the G-test to an r × c contingency table

13.13 Advice on analysing frequencies

14 MEASURING CORRELATIONS

14.1 The meaning of correlation

14.2 Investigating correlation

14.3 The strength and significance of a correlation

14.4 Covariance

14.5 The Product Moment Correlation Coefficient

14.6 The coefficient of determination r2

14.7 The Spearman Rank Correlation Coefficient rs

14.8 Advice on measuring correlations

15 REGRESSION ANALYSIS

15.1 Introduction

15.2 Gradients and triangles

15.3 Dependent and independent variables

15.4 A perfect rectilinear relationship

15.5 The line of least squares

15.6 Simple linear regression

15.7 Fitting the regression line to the scattergram

15.8 The error of a regression line

15.9 Confidence limits of an individual estimate

15.10 The significance of the regression line

15.11 The difference between two regression lines

15.12 Dealing with curved relationships

15.13 Transformation of both axes

15.14 Regression through the origin

15.15 An alternative line of best fit

15.16 Advice on using regression analysis

16 COMPARING AVERAGES

16.1 Introduction

16.2 Matched and unmatched observations

16.3 The Mann–Whitney U-test for unmatched samples

16.4 Advice on using the Mann–Whitney U-test

16.5 More than two samples – the Kruskal–Wallis test

16.6 Advice on using the Kruskal‒Wallis test

16.7 The Wilcoxon test for matched pairs

16.8 Advice on using the Wilcoxon test for matched pairs

16.9 Comparing means – parametric tests

16.10 The F-test (two-tailed)

16.11 The z-test for comparing the means of two large samples

16.12 The t-test for comparing the means of two small samples

16.13 The t-test for matched pairs

16.14 Advice on comparing means

17 ANALYSIS OF VARIANCE – ANOYA

17.1 Why do we need ANOVA?

17.2 How ANOVA works

17.3 Procedure for computing one-way ANOVA

17.4 Procedure for computing the Tukey test

17.5 Two-way ANOVA

17.6 Procedure for computing two-way ANOVA

17.7 Procedure for computing the Tukey Test in two-way ANOVA

17.8 Two-way ANOVA with single observations

17.9 The randomized block design

17.10 The Latin square

17.11 Analysis of variance in regression

17.12 Advice on using ANOVA

18 MULTIVARIATE ANALYSIS

18.1 Introduction

18.2 What is information?

18.3 Making large problems manageable

18.4 Are there three groups or four?

18.5 Learning from experience?

18.6 Variations on a theme

18.7 Summary

APPENDICES

BIBLIOGRAPHY AND FURTHER READING

INDEX

Copyright © 1998 John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ, England

Telephone (+44) 1243 779777

Email (for orders and customer service enquiries): [email protected] Visit our Home Page on www.wileyeurope.com or www.wiley.com

1st edition 1990 by Open University Press, Reprinted by John Wiley & Sons Ltd, in 1992 Reprinted 1993, 1994, 1995, 1996, 1997,

2nd edition reprinted September 1998, February 1999, August and November 2000, August 2001, August 2002, January 2003, January 2004, January 2005, January 2006, November 2006, February 2008, October 2008, September 2009

All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except under the terms of the Copyright, Designs and Patents Act 1988 or under the terms of a licence issued by the Copyright Licensing Agency Ltd, 90 Tottenham Court Road, London W1T 4LP, UK, without the permission in writing of the Publisher. Requests to the Publisher should be addressed to the Permissions Department, John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ, England, or emailed to [email protected], or faxed to (+44) 1243 770571.

This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the Publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought.

Other Wiley Editorial Offices

John Wiley & Sons Inc., 111 River Street, Hoboken, NJ 07030, USA

Jossey-Bass, 989 Market Street, San Francisco, CA 94103-1741, USA

Wiley-VCH Verlag GmbH, Boschstr. 12, D-69469 Weinheim, Germany

John Wiley & Sons Australia Ltd, 33 Park Road, Milton, Queensland 4064, Australia

John Wiley & Sons (Asia) Pte Ltd, 2 Clementi Loop #02-01, Jin Xing Distripark, Singapore 129809

John Wiley & Sons Canada Ltd, 22 Worcester Road, Etobicoke, Ontario, Canada M9W 1L1

British Library Cataloguing in Publication Data

A catalogue record for this book is available from the British Library

ISBN 978-0-471-98295-1 (HB)

ISBN 978-0-471-98296-8 (PB)

PREFACE

It is eight years since Prcicticul Statistics for Field Biolocjy was first published and we are indebted to John Wiley & Sons for the opportunity of updating the text with a second edition.

Phil Jarvis joins Jim Fowler and Lou Cohen as co-author to broaden the scope of the book by including a new chapter on multivariate analysis and strengthening several sections including probability and data transformations.

The fundamental purpose of Pructicul Statisitics,for Field Biologjl remains the same as when it was first conceived, that is, to help students of field biology and cognate disciplines relate to their particular and often diverse data with confidence and ease. Our conviction remains that the surest way to learn statistics is to apply them and this is still the best advice we can offer to readers. The inclusion of the new chapter on multivariate analysis is intended to encourage the use of more powerful statistical analyses that respect the multiplicity of field data, thereby giving both undergraduate and postgraduate students greater scope in dealing with the richness and complexity of the variables they choose to explore.

We would like to think that the extended coverage in the second edition of Pructicul Stutistics,for Field Biology will ensure that it continues to enjoy wide success as a user-friendly, introductory text to students of life sciences and associated fields.

We thank our wives without whose forbearance we could not have accumulated the research data upon which this book so heavily depends.

1

INTRODUCTION

1.1 What do we mean by statistics?

Statistics are a familiar and accepted part of the modern world, and already intrude into the life of every serious biologist. We have statistics in the form of annual reports, various censuses, distribution surveys, museum records – to name just a few. It is impossible to imagine life without some form of statistical information being readily at hand.

The word statistics is used in two senses. It refers to collections of quantitative information, and methods of handling that sort of data. A society’s annual report, listing the number or whereabouts of interesting animal or plant sightings, is an example of the first sense in which the word is used. Statistics also refers to the drawing of inferences about large groups on the basis of observations made on smaller ones. Estimating the size of a population from a capture–recapture experiment illustrates the second sense in which the word is used.

Statistics, then, is to do with ways of organizing, summarizing and describing quantifiable data, and methods of drawing inferences and generalizing upon them.

1.2 Why is statistics necessary?

A second reason why statistical literacy is important to biologists is that if they are going to undertake an investigation on their own account and present their results in a form that will be authoritative, then a grasp of statistical principles and methods is essential. Indeed, a programme of work should be planned anticipating the statistical methods that are appropriate to the eventual analysis of the data. Attaching some statistical treatment as an afterthought to make the study seem more ‘respectable’ is unlikely to be convincing.

1.3 Statistics in field biology

‘Laboratory’ biologists may have high levels of confidence in the precision and accuracy of the measurements they make. To them, collecting meadow dwelling insects with a sweep net might appear a hilarious exercise with a ludicrously low level of reliability. Field biologists therefore require special sampling procedures and analytical methods if their assertions are to be regarded with credibility. Often data accumulated do not conform to the sort of symmetrical patterns taken for granted in the common statistical techniques; data may be ‘messy’, irregular or asymmetrical. Special treatments may be necessary before they can be properly evaluated.

1.4 The limitations of statistics

Statistics can help an investigator describe data, design experiments, and test hunches about relationships among things or events of personal interest. Statistics is a tool which helps acceptance or rejection of the hunches within recognized degrees of confidence. They help to answer questions like, ‘If my assertion is challenged, can I offer a reasonable defence?’, ‘Am I justified in spending more time or resources in pursuing my hunch?’, or ‘Can my observations be attributable to chance variation?’.

It should be noted that statistics never prove anything. Rather they will indicate the likelihood of the results of an investigation being the product of chance.

1.5 The purpose of this text

The objectives of this text stem from the points made in Sections 1.2 and 1.3 above. First, the text aims to provide field biologists with sufficient grounding in statistical principles and methods to enable them to read and understand research reports in the journals they read. Second, the text aims to present biologists with a variety of the most appropriate statistical tests for their problems. Third, guidance is offered on ways of presenting the statistical analyses, once completed.

2

MEASUREMENT AND SAMPLING CONCEPTS

2.1 Populations, samples and observations

Biologists are familiar with the term population as meaning all the individuals of a species that interact with one another to maintain a homogeneous gene pool. In statistics, the term population is extended to mean any collection of individual items or units which are the subject of investigation. Characteristics of a population which differ from individual to individual are called variables. Length, mass, age, temperature, proximity to a neighbour, number of parasites, number of petals, to name but a few, are examples of biological variables to which numbers or values can be assigned. Once numbers or values have been assigned to the variables they can be measured.

Because it is rarely practicable to obtain measures of a particular variable from all the units in a population, the investigator has to collect information from a smaller group or sub-set which represents the group as a whole. This sub-set is called a sample. Each unit in the sample provides a record, such as a measurement, which is called an observation. The relationship between the terms we have introduced in this section is summarized below:

Observation

:

132 mm

Variable

:

wing length

Sample unit

:

a starling from a communal roost

Sample

:

those starlings which are captured in the roost and are measured

Statistical

population

:

all starlings in the roost which are available for capture and measurement

Biological

population

:

the biological population may well include birds that are not available for capture (e.g. mates that are roosting elsewhere) and are therefore not part of the statistical population. Alternatively, if the roost comprises a mixture of resident birds and winter immigrants, the statistical population might include components of more than one biological population.

2.2 Counting things – the sampling unit

Field biologists often count the number of objects in a group or collection. If the number is to be meaningful, the dimensions of the collection have to be specified. A collection with specified dimensions is called a sampling unit; a set of sampling units comprise a sample. An observation is, of course, the number of objects in a particular sampling unit. Examples of sampling units are:

Observation

Sampling unit

Number of orchids

A quadrat of stated area

Number of crickets in a sweep net

Volume of vegetation swept (diameter of net × distance moved)

Number of nematodes in a soil core

Soil core of stated dimensions

Number of visits by bees to a flower

A specified interval of time

Number of wading birds on a shore

A specified length of coastline

Number of ectoparasites

A single host

Number of beetles in a pitfall trap

A trap of stated size

When observations are counts, the statistical population has nothing to do with the objects we are counting, even when they are organisms. The following example illustrates the point.

Observation:

23

Variable:

number of cockles

Sampling unit:

a square quadrat of stated area from which sand is sieved and cockles counted

Sample:

the number of quadrats (sampling units) examined

Statistical

population:

the total number of quadrats it is

possible

to mark out in the whole of the study area. The potential number of units in the population depends on the chosen dimensions of the sampling unit.

The main difference between ‘measuring’ and ‘counting’ is that we have no control over the dimensions of a unit in a sample when we are measuring; when counting, we are able to choose the dimensions of the sampling unit. Remember that the content of a trap, net or quadrat is a sample if we are measuring the objects in it, but only a unit in a sample if we are counting them.

It is always worthwhile to ask the question, ‘from which population are my sampling units drawn?’. The answer may not always be as obvious as in the example of the cockles. The contents of 10 pit-fall traps set into the ground overnight constitute a sample – but from which population are these sampling units drawn? It is regarded as being the total number of traps that could have been set out, covering the whole of the study area. Because it is axiomatic that field biologists try not to destroy the habitat they are studying, a statistical population is sometimes notional, or hypothetical.

2.3 Random sampling

We say in Section 2.1 that a sample represents the population from which it is drawn. If the sample is to be truly representative, the units in the sample must be drawn randomly from the population; that is to say, in a manner that is free from bias. In other words, each unit in a population must have an equal chance of being drawn.

As an example of a possible source of bias, consider a biologist who wishes to measure the average mass of bank voles Clethrionomys glareolus inhabiting a study site. Attempts are made to catch them by setting Longworth mammal traps baited with grain. Before capture, an animal has to overcome trap shyness. It is plausible that the threshold of shyness is lower in hungry animals than in well-fed ones and that the former may have a greater chance of being drawn from the population. If hungry voles are lighter than well-fed ones, our biologist’s sample may not be a fair representation of the whole population.

Statistical analysis is frequently conducted on the assumption that samples are random. If, for any reason, that assumption is false and bias is present in the sampling procedure, then the information gained from the sample may not be properly extrapolated to the population. Unfortunately, it is rarely possible to do more than guess how great bias may be. This severely reduces the confidence which can be placed in estimations based on sampling data. Since most sources of bias arise from the methodology adopted, procedures should always be fully described. When a source of bias is suspected, it should be acknowledged and taken into account in the interpretation of results. The practical aspects of obtaining random samples is a large area in itself, partly because the field techniques used by biologists are so diverse. We suggest you consult Southwood (1978) as a standard work on this subject (see Bibliography).

2.4 Random numbers

One way to avoid bias is to assign a unique number to each individual unit in a population and select units to be measured by reference to random numbers. Often this is impossible because we cannot always choose our units – we measure what we can catch, as in the example of the voles. However, it is sometimes possible – indeed essential – to obtain truly random sampling units. In the case of our cockle example in Section 2.2, the quadrats comprising the sample could be located at the intersection of grid coordinates prescribed by pairs of random numbers. Whenever there is opportunity to select ‘which plots?’, ‘which pools?’, or ‘which positions?’, then selection must be based on random numbers.

There are two usual ways of obtaining random numbers. First, many calculators and pocket computers have a facility for generating random numbers. These are often in the form of a fraction, e.g. 0.2771459. You may use this to provide a set of integers, 2, 7, 7, 1,…, or 27, 71, 45,…; or 277, 145,…; or 2.7, 7.1, …; and so on, keying a new number when more digits are required.

Second, use may be made of random number tables. Appendix 1 is such a table. The numbers are arranged in groups of five in rows and columns, but this arrangement is arbitrary. Starting in the top left corner you may read, 2, 3, 1, 5, 7, 5, 4,…; or 23, 15, 75, 48,…; or 231, 575, 485,…; or 23.1, 57.5, 48.5, 90.1,…; and so on, according to your needs. When you have obtained the numbers you need for the investigation in hand, mark the place with a pencil. Next time, carry on where you left off.

It is possible, by chance, that a random number will prescribe a unit that has already been drawn. In this event, ignore the number and take the next random number. The purpose is to eliminate your prejudice as to which units should be picked for measurement or counting. Unfortunately, observer bias, conscious or subconscious, is notoriously difficult to avoid when gathering data in support of a particular hunch!

2.5 Independence

Many statistical methods assume that observations in a sample are independent. That is to say, the value of any one observation in a sample is not inherently linked to that of another. An example should make this clear. A biologist wishes to compare the average spikelet length of rough meadow grass growing in one field with that growing in another. One hundred flowering heads are obtained randomly from the first field, a spikelet is removed from each and measured. In the second field, the plant is harder to find and only 80 flower heads are collected, a spikelet being removed from each and measured. If the biologist now tries to ‘make up the number’ by removing a further 20 spikelets from one plant, these observations are not independent of each other even if the plant itself is randomly selected. A genetic peculiarity in the plant that affects the size of one spikelet is likely to affect them all. This may distort the sample (see also Section 13.4).

2.6 Statistics and parameters

The measures which describe a variable of a sample are called statistics. It is from the sample statistics that the parameters of a population are estimated. Thus, the average mass of a random sample of voles is the statistic which is used to estimate the average mass parameter of the population. The average number of cockles in a random sample of quadrats estimates the average number of cockles per quadrat in the whole population of quadrats.

Hypothetical populations have hypothetical parameters. The average number of beetles in 10 randomly placed pit-fall traps estimates the average number of beetles per trap if the whole habitat had been covered by traps, in which case there are no beetles left to count! Samples from hypothetical populations are generally used for comparative purposes, for example to compare one woodland type with another.

In estimating a population parameter from a sample statistic, the number of units in a sample can be critical. Some statistical methods depend on a minimum number of sampling units and, where this is the case, it should be borne in mind before commencing fieldwork. Whilst it is true that larger samples will invariably result in greater statistical confidence, there is nevertheless a ‘diminishing returns’ effect. In many cases the time, effort and expense involved in collecting very large samples might be better spent in extending the study in other directions. We offer guidance as to what constitutes a suitable sample size for each statistical test as it is described.

2.7 Descriptive and inferential statistics

Descriptive statistics are used to organize, summarize and describe measures of a sample. No predictions or inferences are made regarding population parameters. Inferential (or deductive) statistics, on the other hand, are used to infer or predict population parameters from sample measures. This is done by a process of inductive reasoning based on the mathematical theory of probability. Fortunately, only a very minimal knowledge of mathematical theory of probability is needed in order to apply the rules of the statistical methods, and the little that is needed will be explained. However, no one can predict exactly a population parameter from a sample statistic, but only indicate with a stated degree of confidence within what range it lies. The degree o′f confidence depends on the sample selection procedures and the statistical techniques used.

2.8 Parametric and non-parametric statistics

Statistical methods commonly used by biologists fall into one of two classes – parametric and non-parametric. Parametric methods are the oldest, and although most often used by statisticians, may not always be the most appropriate for analysing biological data. Parametric methods make strict assumptions which may not always hold true.

More recently, non-parametric methods have been devised which are not based upon stringent assumptions. These are frequently more suitable for processing biological data. Moreover they are generally simpler to use since they avoid the laborious and repetitive calculations involved in some of the parametric methods. The circumstances under which a particular method should be used will be described as it arises. A summary showing which methods should be applied in particular circumstances is provided in Section 12.8.

3

PROCESSING DATA

3.1 Scales of measurement

Variables measured by biologists can be either discontinuous or continuous. Values of discontinuous variables assume integral whole numbers and are usually counts of things (frequencies). On the other hand, values of continuous variables may, in principle, fall at any point along an uninterrupted scale, and are usually measurements (length, mass, temperature, etc.). Measurement values may sometimes appear to be integral whole numbers if the recorder elects to measure to the nearest whole unit; this does not, however, obviate the fact that there can be intermediate values. The distinction between ‘count data’ and ‘measurement data’ is an important one which will be referred to frequently.

Generally, four levels of measurement are recognized. They are referred to as nominal, ordinal, interval and ratio scales. Each level has its own rules and restrictions; moreover each level is hierarchical in that it incorporates the properties of the scale below it.

3.2 The nominal scale

The most elementary scale of measurement is one which does no more than identify categories into which individuals may be classified. The categories have to be mutually exclusive, i.e. it should not be possible to place an individual in more than one category. The nominal level of measurement is often used by biologists. For example, species, sex, colour and habitat type are all nominal categories into which count data can be assigned.

The name of a category can of course be substituted by a number – but it will be a mere label and have no numerical meaning. Thus, if blue tits are coded 1, coal tits 2, great tits 3, willow tits 4 and marsh tits 5 they can then be listed, 1,2,3,4,5 but the sequence has no more mathematical significance than if they had been listed 4,2,1,5,3. They are still nominal categories.

3.3 The ordinal scale

The ordinal scale incorporates the classifying and labelling function of the nominal scale, but in addition brings to it a sense of order. Ordinal numbers are used to indicate rank order, but nothing more. The ordinal scale is used to arrange (or rank) individuals into a sequence ranging from the highest to the lowest, according to the variable being measured. Ordinal numbers assigned to such a sequence may not indicate absolute quantities, nor can it be assumed that intervals between adjacent numbers on the scale are equal.

An example of an ordinal scale is the DAFOR scale used to record the abundance of different plant species in a quadrat:

Score

D

ominant

5

A

bundant

4

F

requent

3

O

ccasional

2

R

are

1

In this scale there is no simple relationship between the numerical values of the abundance scale. ‘Abundant’ does not mean twice ‘occasional’, but it will always be ranked above ‘frequent’.

3.4 The interval scale

As the term interval implies, in addition to rank ordering data, the interval scale allows the recognition of precisely how far apart are the units on the scale. Interval scales permit certain mathematical procedures untenable at the nominal and ordinal levels of measurement. Because it can be concluded that the difference between the values of, say, the 8th and 9th points on the scale is the same as that between the 2nd and 3rd, it follows that the intervals can be added or subtracted. But because a characteristic of interval scales is that they have no absolute zero point it is not possible to say that the 9th value is three times that of the 3rd. To illustrate this, date is a very widely used interval scale. If the first-arrival dates of four species of warbler are, respectively, the 1st, 5th, 10th and 15th May, the interval between each point on the scale (1 day) is equal, and the fourth species took 10 days longer to arrive than the second. It did not take three times as long, however, any more than it took 15 times longer to arrive than the first species! Another interval scale is temperature: 10°C is not twice as hot as 5°C because the zero on the scale in question (Celsius) is not absolute.

3.5 The ratio scale

The highest level of measurement, which incorporates the properties of the interval, ordinal and nominal levels, is the ratio scale. A ratio scale includes an absolute zero, it gives a rank ordering and it can simply be used for labelling purposes. Because there is an absolute zero, all of the mathematical procedures of addition, subtraction, multiplication and division are possible. Measurements of length and mass fall on ratio scales. Thus, a length of 150 mm is three times as long as one of 50 mm.

The mathematical properties of interval and ratio scales are similar and as no statistical procedure described here will distinguish between them, we shall refer to them both as ‘interval’ scales.

3.6 Conversion of interval observations to an ordinal scale

Usually, observations made on interval scales allow the execution of more sensitive statistical analyses. Sometimes, however, interval data are not suitable for certain methods. Perhaps, because there are too few observations, we are forced to downgrade them to an ordinal rank scale for use in non- parametric methods. The following measurements (mm) are ranked in increasing size in the top line. Their rank (ordinal) scores are underneath:

If large numbers of observations of a variable are collected, it is almost inevitable that some of the observations will be equal in value. Their ranks will also be tied and these have to be dealt with correctly. Since some statistical tests which we describe later depend on the ranking of observations, we take the opportunity now of dealing with the problem of tied observations.

Where tied observations occur, each of them is assigned the average of the ranks that would have been given if there had been no ties. To illustrate this, a set of measurements rounded to the nearest whole number is given below. For convenience they are presented in ascending order and adjacent tied scores are underlined:

If we try to rank these, the single extreme values of 25 and 37 will clearly be ranked 1 and 23, respectively. The two values of 27 together occupy the ranks of 3 and 4; they are each assigned the average rank of 3. The three values of 30 occupy the ranks 7, 8 and 9. They have an average rank of 8. In similar manner, the four values of 33 are each assigned the rank 13 and the 5 of 36 the rank of 20. The set of data is rewritten below, with the correct ranks assigned:

3.7 Derived variables

Sometimes observations are processed in order to generate a derived number. Examples of derived variables are ratios, proportions, percentages, and rates.

Example 3.1

The number of four species of woodlice in a pit-fall trap are: Oniscus 12; Porcellio 8; Philoscia 5; Armadilidium 2. What is the proportion of each species in the sample?

The proportion is given by:

where pi is the proportion of a particular category, ni is the number of individuals in a particular category and N is the total number in all categories. Therefore:

Notice that the sum of the individual proportions equals 1.

When a proportion is multiplied by 100 it is called a percentage. The percentage of each species of woodlice in the sample described in Example 3.1 is included in the table above.

A rate is the ratio of an observation to a period of time. Rates are useful for expressing such variables as growth, population change, and movement.

Example 3.2

A shoot grows 12 cm in 4 days.

A pigeon flies 1728 km in 24 h.

Other derived variables we refer to in this text are the Lincoln index (Section 11.9) and a diversity index (Section 11.10). Statistical techniques may be performed upon derived variables; sometimes the data first have to be converted or transformed (see, for example, Section 10.5).

3.8 The precision of observations

When an observation is of a discrete variable, that is a count, we are usually sure of its precision. A nest may have exactly four eggs in it. A measurement, on the other hand, is never exact; it is only precise to within certain limits.

If the diameter of a pond is measured with a tape marked in 1-metre intervals, measurements are precise to the nearest whole metre. An observation of 10 m can be recorded as 10 ±0.5 m. This implies that all distances between the limits of 9.5 m and 10.5 m are recorded as 10 m, as shown in Fig. 3.1.

We could use a more finely graduated scale, for example a tape marked in 10cm (0.1 m) intervals. Each observation is then precise to the nearest 0.1 m and an observation of 10.6 m is written as 10.6±0.05 m; that is, all distances between 10.55 m and 10.65 m are recorded as 10.6 m.

We could continue increasing the precision of measurements by using a metre rule graduated in millimetres, Vernier callipers capable of recording

0. 01 mm, or a microscope eye-piece graticule down to 0.001 mm. In each case, the measurement is precise to plus or minus half the interval spanned by the last measured digit; in the case of the graticule measurement this is ±0.0005 mm. An observation of 1.364 mm is within an interval spanning 1.3635 mm to 1.3645 mm.

Note the distinction between precision and accuracy. An expensive spring balance might be precise, weighing to 0.1 g, but if it is badly adjusted it will not be accurate. A broken clock is accurate twice a day!

Fig. 3.1 The limits of an observation.

3.9 How precise should we be?

Since it is clearly possible to choose (within reason) the degree of precision of a measurement, the question arises, ‘how precisely should we make our measurements?’. We should not choose, for example, to measure a transect across a salt-marsh in millimetres.

3.10 The frequency table

Collected data should be organized and summarized in a form that allows further interpretation and analysis. In Table 3.1 are the lengths (to the nearest whole millimetre) of 100 shoots grown from seeds that were planted at the same time.

The measurements are presented in the order that the shoots were measured and are therefore ungrouped. A quick scan of Table 3.1 reveals that particular values are repeated a number of times: there are, for example, five values of 74 mm in the top row alone. The value of 74 mm is called a frequency class and, rather than record all 100 values individually (as in the table), it is more economical of space and more revealing to group the data into all the frequency classes. We should remember that each frequency class is a class interval with implied limits of ±0.5 mm. Thus, all lengths between 73.5 mm and 74.5 mm are placed in the ‘74 mm’ class. The grouped observations are shown in Table 3.2.

Table 3.1 Lengths of 100 shoots (mm)

Table 3.2 Grouped lengths of 100 shoots

The data in Table 3.2 are grouped into columns: implied class interval, frequency class x and frequency f. The tallies are presented in this example to give a visual appreciation of how the frequencies are distributed between the frequency classes. The two columns x and f represent a frequency table. Although this particular table has been constructed for interval measurements, frequency tables can also be constructed for count data and for nominal and ordinal scales. The manner in which frequencies are distributed between the frequency classes is described as a frequency distribution.

Readers will note that there are only 12 unit steps between the largest and smallest observations in the table above and an increase in precision is highly desirable. The purpose of the hypothetical data is simply to illustrate the construction of a frequency table.

3.11 Aggregating frequency classes

When the spread of observations is large and the number of observations relatively few, then a frequency distribution appears drawn out and disjointed if every step interval corresponds to a frequency class. In such cases it is advisable to aggregate, or group adjacent classes to smooth out the distribution. The following example shows how to do this.

Example 3.3

Table 3.3 Masses of 50 salmon

The data are summarized and grouped into the frequency table in Table 3.4.

Table 3.4 A grouped frequency table

The steps involved in the construction of Table 3.4 are listed below. They enable the construction of frequency tables from any set of measurement data with a degree of class aggregation that suits specific needs.

1. Determine the range of scores: the highest observation minus the lowest observation plus 1. (One is added to take into account the implied limits of the numbers.)
Decide how many categories (class intervals) are required. Normally the number of class intervals is not less than 10 and not more than 20. In this case we have selected 15 as a convenient number of classes.
2. Divide the range by the number of class intervals required. This gives the number of unit steps which are aggregated to make up an interval class.
Number of unit steps per class interval== 3
If the calculated number of unit steps per class interval is a fraction, then round to the nearest whole number.
3. Construct the class interval column starting at the top with the lower limit of the smallest observation (155.5). Add the class interval size (3) to this lower limit. The range of the lowest class interval becomes 155.5 to 158.5. The lower limit of the next class becomes 158.5 to which 3 is added to give the class interval range 158.5 to 161.5. This procedure is repeated, moving down the column until the class interval column includes an interval into which the largest observation (200) can be placed, namely, 197.5 to 200.5.
5. Insert in the next column provided a tally for each individual observation in the raw data table. For example, for the observation 162, a tally is inserted to show that it falls into the class interval range 161.5 to 164.5.
6. Total up the tallies within each class interval and place in the frequency column in line with the appropriate class.
7. Total the frequency column (n). This serves as a useful check that all data have been included in the table.

3.12 Frequency distribution of count observations

In principle, the construction of a frequency distribution of count observations is similar to that of measurement observations. If the observations in Section 3.10 were counts, for example, the number of daisy flowers in 100 quadrats on a lawn, the frequency distribution would be exactly the same. However, because each observation, and hence, frequency class has an exact value, the column of implied class interval is redundant. There is no implied upper and lower limit to a count of 74 daisies.

A minor problem may arise if frequency classes are aggregated. Imagine that the observations in Example 3.3 are counts, for example, the number of protozoa counted in 50 sampling units of pond water. The construction of a frequency table is undertaken exactly as described, except that the column of implied limits is again redundant. The class mark (the mid-point of each class interval) is still a useful number, however.

3.13 Dispersion

The word distribution calls for as much care in its usage as the word population when biologists set out to describe and to analyse their data by statistical methods.

As we have seen, distribution has a special meaning in statistics to do with how observations of a variable are spread over the range of measurement. Biologists commonly use the word distribution to mean how organisms are scattered about in the environment. Thus, ‘the pipistrelle bat has a wide distribution’ or ‘gull nests are distributed evenly through the colony’.

To avoid risk of confusion, we adopt the word dispersion to indicate how or where objects are placed in the environment.