Statistical Analysis of Geographical Data - Simon James Dadson - E-Book

Statistical Analysis of Geographical Data E-Book

Simon James Dadson

0,0
35,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Statistics Analysis of Geographical Data: An Introduction provides a comprehensive and accessible introduction to the theory and practice of statistical analysis in geography. It covers a wide range of topics including graphical and numerical description of datasets, probability, calculation of confidence intervals, hypothesis testing, collection and analysis of data using analysis of variance and linear regression. Taking a clear and logical approach, this book examines real problems with real data from the geographical literature in order to illustrate the important role that statistics play in geographical investigations. Presented in a clear and accessible manner the book includes recent, relevant examples, designed to enhance the reader’s understanding.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 286

Veröffentlichungsjahr: 2017

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

Title Page

Preface

1 Dealing with data

1.1 The role of statistics in geography

1.2 About this book

1.3 Data and measurement error

Exercises

2 Collecting and summarizing data

2.1 Sampling methods

2.2 Graphical summaries

2.3 Summarizing data numerically

Exercises

3 Probability and sampling distributions

3.1 Probability

3.2 Probability and the normal distribution: z‐scores

3.3 Sampling distributions and the central limit theorem

Exercises

4 Estimating parameters with confidence intervals

4.1 Confidence intervals on the mean of a normal distribution: the basics

4.2 Confidence intervals in practice: the

t

‐distribution

4.3 Sample size

4.4 Confidence intervals for a proportion

Exercises

5 Comparing datasets

5.1 Hypothesis testing with one sample: general principles

5.2 Comparing means from small samples: one‐sample

t

‐test

5.3 Comparing proportions for one sample

5.4 Comparing two samples

5.5 Non‐parametric hypothesis testing

Exercises

6 Comparing distributions

6.1 Chi‐squared test with one sample

6.2 Chi‐squared test for two samples

Exercises

7 Analysis of variance

7.1 One‐way analysis of variance

7.2 Assumptions and diagnostics

7.3 Multiple comparison tests after analysis of variance

7.4 Non‐parametric methods in the analysis of variance

7.5 Summary and further applications

Exercises

8 Correlation

8.1 Correlation analysis

8.2 Pearson’s product‐moment correlation coefficient

8.3 Significance tests of correlation coefficient

8.4 Spearman’s rank correlation coefficient

8.5 Correlation and causality

Exercises

9 Linear regression

9.1 Least‐squares linear regression

9.2 Scatter plots

9.3 Choosing the line of best fit: the ‘least‐squares’ procedure

9.4 Analysis of residuals

9.5 Assumptions and caveats with regression

9.6 Is the regression significant?

9.7 Coefficient of determination

9.8 Confidence intervals and hypothesis tests concerning regression parameters

9.9 Reduced major axis regression

9.10 Summary

Exercises

10 Spatial Statistics

10.1 Spatial Data

10.2 Summarizing Spatial Data

10.3 Identifying Clusters

10.4 Interpolation and Plotting Contour Maps

10.5 Spatial Relationships

Exercises

11 Time series analysis

11.1 Time series in geographical research

11.2 Analysing time series

11.3 Summary

Exercises

Appendix A: Introduction to the R package

A.1 Obtaining R

A.2 Simple calculations

A.3 Vectors

A.4 Basic statistics

A.5 Plotting data

A.6 Multiple figures

A.7 Reading and writing data

A.8 Summary

Appendix B: Statistical tables

References

Index

End User License Agreement

List of Tables

Chapter 01

Table 1.1 Prefixes used to indicate powers of 10.

Chapter 02

Table 2.1 Geography test scores.

Table 2.2 Frequency distribution for test scores.

Table 2.3 Concentration of lead (Pb) in pine trees (ppm).

Table 2.4 Calculation of variance and standard deviation for lead concentrations in pine trees.

Table 2.5 Rainfall data in Glasgow and London.

Table 2.6 Weekly food expenditure in two suburbs of a city.

Chapter 05

Table 5.1 Beach pollution.

Table 5.2 Summary of rainfall data in Glasgow and London.

Table 5.3 Comparison of lake acidity before and after pollution reduction programme.

Table 5.4 Job security data.

Table 5.5 Ranked job security data.

Chapter 06

Table 6.1 Museum visitor ages.

Table 6.2 Contingency table for museum visitor ages.

Table 6.3 River habitat biodiversity.

Table 6.4  Contingency table for stream habitat biodiversity showing observed counts and calculated row and column totals.

Table 6.5 Contingency table for stream habitat biodiversity showing expected frequencies.

Table 6.6 Distribution of graffiti by building type.

Table 6.7 Distribution of bird seed availability.

Table 6.8 Effects of preschool education on literacy.

Chapter 07

Table 7.1 Crop yield data for single factor analysis of variance.

Table 7.2 ANOVA table for crop yield analysis of variance.

Table 7.3 Average A‐Level point scores for schools in three areas.

Table 7.4 Slope angles and lithology.

Chapter 08

Table 8.1 July rainfall and corn yield in Iowa, USA.

Table 8.2 Calculations for Pearson’s product‐moment correlation coefficient between July rainfall and corn yield in Iowa, USA.

Table 8.3 Property prices in Oxford, UK.

Table 8.4 Species–area relationship for birds of the Solomon Archipelago.

Chapter 09

Table 9.1 Tree height and girth data.

Table 9.2 Calculations to estimate parameters of best‐fit linear model to trees data using least squares procedure.

Table 9.3 ANOVA table for regression between tree height and diameter.

Table 9.4 Species richness and pH for lowland stream reaches.

Table 9.5 Pollutant concentration away from a disused mine.

Chapter 10

Table 10.1 R packages for plotting and analysing spatial data.

Table 10.2 Parameters of the Ordnance Survey Great Britain projection (OSGB36).

Table 10.3  Data for London Boroughs (2013/14 figures).

Chapter 11

Table 11.1 Mean monthly temperature data (°C) for Oxford, UK, 1961–1990.

Table 11.2 Mean monthly precipitation data (mm/month) for Oxford, UK, 1961–1990.

Appendix A

Table A.1 Plotting options in R.

Appendix B

Table B.1

z

‐Table.

Table B.2

t‐

Distribution.

Table B.3 Critical values of the

F

‐distribution.

Table B.4 Critical values of the

U

‐distribution.

Table B.5 Critical values of the

χ

2

distribution.

List of Illustrations

Chapter 01

Figure 1.1 Accuracy and precision in archery. (a) High accuracy with high precision; (b) high accuracy with low precision; (c) low accuracy but high precision; (d) low accuracy and low precision. Note that without knowing the location of the archery target (i.e. the true value of the measured quantity), cases (c) and (d) are indistinguishable from (a) and (b), respectively.

Chapter 02

Figure 2.1 Examples of histograms showing typical properties of frequency distributions: (a) positive skew; (b) negative skew; (c) platykurtic; (d) leptokurtic; (e) bimodal; and (f) multimodal.

Figure 2.2 Histogram of test score data.

Figure 2.3 Time series of Mauna Loa atmospheric CO

2

concentrations. Note that this dataset is built into R and can be plotted using

plot(co2)

.

Figure 2.4 Scatter plot of July rainfall and corn yield in Iowa.

Figure 2.5 Life expectancy at birth and gross domestic product (GDP) per capita in dollars (adjusted for inflation): (a) with standard axes; and (b) with logarithmically spaced horizontal axis.

Figure 2.6 Histogram of lead concentration in pine trees at (a) Site A and (b) Site B.

Chapter 03

Figure 3.1 Theoretical form of the normal distribution.

Figure 3.2 Properties of the normal distribution: (a) 68% of observations lie within one standard deviation either side of the mean; and (b) 95% of observations lie within two standard deviations either side of the mean.

Figure 3.3 The area to the left of a particular z‐score gives the fraction of the possible values in the distribution that are lower than that z‐score. (a) z = 0; (b) z = −1; (c) z = 2.5.

Figure 3.4 Illustration of the central limit theorem: the more samples we draw the closer their average approximates a normal distribution.

Chapter 04

Figure 4.1 Comparison of

t

‐distribution and normal distribution: (a) comparison of the normal distribution with the

t

‐distribution for different numbers of degrees of freedom; and (b) area under the normal distribution compared with the area under the

t

‐distribution.

Chapter 07

Figure 7.1 Completely randomized experimental design.

Figure 7.2 Plots of crop yields: (a) graph of yields by consecutive plot number; and (b) boxplot grouped by level.

Figure 7.3 Diagnostic plots for analysis of variance: (a) residuals versus fitted values; and (b) normal probability plot of residuals.

Chapter 09

Figure 9.1 The relation between tree height and diameter in 31 cherry trees.

Figure 9.2 Illustration of the principle behind linear least squares regression.

Figure 9.3 Plots showing typical configurations of residuals: (a) standard, normally distributed, homoscedastic; (b) heteroscedastic; and (c) non‐linear model.

Figure 9.4 Diagnostic plots using residuals for tree height regression: (a) residuals against fitted values; and (b) histogram of residuals.

Figure 9.5 Regression by linear least squares: (a) the spread around the mean in the vertical direction; (b) the spread accounted for by the line of best fit; and (c) the residual deviation, or spread around the line of best fit.

Figure 9.6 Confidence intervals (95%) around predictions of tree height.

Chapter 10

Figure 10.1 Map showing global cities with population greater than five million people.

Figure 10.2 Map projections and their properties

Figure 10.3 Comparison of Europe plotted with (a) Mercator projection and (b) Lambert conformal conic (ETRS89‐LCC) with standard parallels at 35° N and 65° N. Note the relative sizes of different countries, especially at high latitudes.

Figure 10.4 Plot showing the locations of the top 30 cities in the UK by population: (a) in latitude‐longitude coordinates; and (b) projected onto GB National Grid coordinates using R.

Figure 10.5 Geographical distribution of the UK population. Open circles indicate individual towns. The filled grey circle is the mean centre. Solid lines are contours of population density in persons per square kilometre calculated using a 60 km averaging kernel.

Figure 10.6 Possible distributions of points across a region: (a) completely spatially random; (b) dispersed; and (c) clustered. Quadrat outlines are drawn with numbers indicating the number of points that lie within each quadrat.

Figure 10.7 Polygons showing London boroughs. Filled circles show the centres of each borough; solid grey lines connect boroughs which share a boundary. Note that the River Thames passes through the centre of the city but has been ignored for the purposes of evaluating shared boundaries. Produced using data from the Office for National Statistics licensed under the Open Government Licence v.3.0. © Crown copyright and database rights (2015) Ordnance Survey 100019153. http://data.london.gov.uk/datastore/package/statistical‐gis‐boundary‐files‐london.

Figure 10.8 Voting patterns in the 2012 London Mayoral election. Polygons represent London boroughs, shaded to indicate whether the majority of first preference votes was for the Labour Party candidate (shown in light grey) or the Conservative Party candidate (shown in dark grey). Produced using data from the Office for National Statistics licensed under the Open Government Licence v.3.0. © Crown copyright and database rights (2015) Ordnance Survey 100019153.

Chapter 11

Figure 11.1 Time series of Mauna Loa atmospheric CO

2

concentrations. Note that this dataset is built into R and can be plotted using

plot (co2)

.

Figure 11.2 Time series of Oxford temperature data from Table 11.1 plotted as a time series using R.

Figure 11.3 Types of moving average.

Figure 11.4 Moving averages of Oxford rainfall data: (a) raw data; (b) three‐point centred moving average with raw data shown in grey; (c) 12‐point centred moving average with raw data shown in grey; and (d) comparison of centred (black dots) and backward (solid black line) 12‐point moving averages with raw data shown in grey.

Figure 11.5 Monthly climatology for Oxford for (a) temperature and (b) precipitation.

Figure 11.6 Typical datasets with their corresponding correlograms shown below: (a, b) CO

2

concentrations showing the effect of a trend on the correlogram; (c ,d) Oxford temperature showing the effect of a seasonal pattern in the dataset; and (e, f) Oxford rainfall showing a dataset which naturally has very little autocorrelation. The dashed line indicates the threshold for statistically significant autocorrelation.

Appendix A

Figure A.1 A basic plot.

Figure A.2 List of plotting symbols commonly used in R graphics.

Figure A.3 A customized plot.

Figure A.4 Multiple plots on a page.

Guide

Cover

Table of Contents

Begin Reading

Pages

iii

iv

xi

xii

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

37

38

39

40

41

42

43

44

45

46

47

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

241

242

243

244

245

246

247

248

249

250

251

252

Statistical Analysis of Geographical Data

An Introduction

 

 

Simon J. Dadson

School of Geography and the Environment, University of Oxford, UK

 

 

 

 

 

 

 

This edition first published 2017© 2017 John Wiley and Sons Ltd

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law.Advice on how to obtain permision to reuse material from this title is available at http://www.wiley.com/go/permissions.

The right of Simon J. Dadson to be identified as the author of this work has been asserted in accordance with law.

Registered OfficesJohn Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK

Editorial Office9600 Garsington Road, Oxford, OX4 2DQ, UK

For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.

Wiley also publishes its books in a variety of electronic formats and by print‐on‐demand. Some content that appears in standard print versions of this book may not be available in other formats.

Limit of Liability/Disclaimer of Warranty

While the publisher and authors have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

Library of Congress Cataloging‐in‐Publication Data

Name: Dadson, Simon J.Title: Statistical analysis of geographical data : an introduction / Simon J. Dadson.Description: 1 edition. | Chichester, West Sussex : John Wiley & Sons, Inc., 2017. | Includes bibliographical references and index.Identifiers: LCCN 2016043619 (print) | LCCN 2017004526 (ebook) | ISBN 9780470977033 (hardback) | ISBN 9780470977040 (paper) | ISBN 9781118525111 (pdf) | ISBN 9781118525142 (epub)Subjects: LCSH: Geography–Statistical methods. | BISAC: SCIENCE / Earth Sciences / General.Classification: LCC G70.3 .D35 2017 (print) | LCC G70.3 (ebook) | DDC 910.72/7–dc23LC record available at https://lccn.loc.gov/2016043619

Cover image: © Joshi Daniel / EyeEm/GettyImagesCover design: Wiley

Preface

Quantitative reasoning is an essential part of the natural and social sciences and it is therefore vital that any aspiring geographer be equipped to perform quantitative analysis using statistics, either in their own work or to understand and critique that of others. This book is aimed specifically at first year undergraduates who need to develop a basic grounding in the quantitative techniques that will provide the foundation for their future geographical research. The reader is assumed to have nothing more than rusty GCSE mathematics. The clear practical importance of quantitative methods is emphasized through relevant geographical examples. As such, the book progresses through the basics of statistical analysis using clear and logical descriptions with ample use of intuitive diagrams and examples. Only when the student is fully comfortable with the basic concepts are more advanced techniques covered. In each section, the following format is employed: (i) an introductory presentation of the topic; (ii) a worked example; and (iii) a set of topical, geographically relevant exercises that the student may follow to probe their understanding and to help build confidence that they can tackle a wide range of problems. Use of the popular R statistical software is integrated within the text so that the reader can follow the calculations by hand whilst also learning how to perform them using industry‐standard open source software. Files containing the data required to solve the worked examples are available at https://simondadson.org/statistical‐analysis‐of‐geographical‐data.

I am grateful for the guidance and wisdom of my own academic advisers: Barbara Kennedy, who sadly died before the book was completed, Mike Church, and Niels Hovius. I am grateful to seven anonymous readers of the book’s outline for their positive support for the idea. To that I must add further thanks owed to colleagues at Oxford University, the Centre for Ecology and Hydrology, and elsewhere for their help and encouragement during the writing of this book. Particular thanks go to Richard Field, Richard Bailey, Toby Marthews and Andrew Dansie who read earlier drafts of the manuscript and made many useful suggestions that have undoubtedly improved the style of the book. My special thanks go to the large number of undergraduate and graduate students who have read the chapters and worked through the exercises in this book. Of course, any remaining errors or ambiguities are my own and I would be most grateful to have them brought to my attention.

At Wiley, I owe a considerable debt of gratitude to Fiona Murphy for her encouragement to undertake this project, Rachael Ballard for commissioning the work, and to Lucy Sayer, Fiona Seymour, Audrie Tan, Ashmita Thomas Rajaprathapan, Wendy Harvey and Gunal Lakshmipathy for their diligence in helping to see the work through to completion.

Finally, I would like to thank my wife, Emma, and our two children, Sophie and Thomas, for their support throughout the process of writing this book and for their tolerance of the time it has taken. To them this book is dedicated.

Oxford, 2016

Simon J. Dadson

1Dealing with data

STUDY OBJECTIVES

Understand the nature and purpose of statistical analysis in geography.

View statistical analysis as a means of thinking critically with quantitative information.

Distinguish between the different types of geographical data and their uses and limitations.

Understand the nature of measurement error and the need to account for error when making quantitative statements.

Distinguish between accuracy and precision and to understand how to report the precision of geographical measurements.

Appreciate the methodological limitations of statistical data analysis.

1.1 The role of statistics in geography

1.1.1 Why do geographers need to use statistics?

Statistical analysis involves the collection, analysis and presentation of numerical information. It involves establishing the degree to which numerical summaries about observations can be justified, and provides the basis for forming judgements from empirical data.

Take the following media headlines, for example:

We know in the next 20 years the world population will increase to something like 8.3 billion people.

Sir John Beddington, UK Government Chief Scientist1

2010 hits global temperature high.

BBC News, 20th January 20112

Each of these statements invites critical scrutiny. The reliability of their sources encourages us to take them seriously, but how do we know that they are correct? It is hard enough to try to predict what one human being will do in any particular year, let alone what several billion are going to do in the next 20 years. How were these predictions made? How was the rate of change of world population calculated? What were the assumptions? What does the author mean by ‘something like’? The number 8.3 billion is quite a precise number: why didn’t the author just say 8 billion or almost 10 billion?

Similarly, how do we know that 2010 is the global temperature high, when temperature is only measured at a small number of measuring stations? How would we go on to investigate whether anthropogenic warming caused the record‐breaking temperature in 2010 or whether it was just a fluke?

Statistical analysis provides some of the tools that can answer some of these questions. This book introduces a set of techniques that allow you to make sure that the statistical statements that you make in your own work are based on a sound interpretation of the data that you collect.

There are four main reasons to use statistical techniques:

to describe and measure the things that you observe;

to characterize measurement error in your observations;

to test hypotheses and theories;

to predict and explain the relationships between variables.

1.2 About this book

One of the best ways to learn any mathematical skill is through repeated practice, so the approach taken in this book uses many examples. The presentation of each topic begins with an introduction to the theoretical principles: this is then followed by a worked example. Additional exercises are given to allow the reader to develop their understanding of the topics involved.

The use of computer packages is now common in statistical analysis in geography: it removes many of the tedious aspects of statistical calculation leaving the analyst to focus on experimental design, data collection, and interpretation. Nevertheless, it is essential to understand how the properties of the underlying data affect the value of the resulting statistics or the outcome of the test under evaluation.

Two kinds of computer software are referred to in this book. The more basic calculations can be performed using a spreadsheet such as Microsoft Excel. The advantages of Excel are that its user interface is well‐known and it is almost universally available in university departments and on student computers. For more advanced analysis, and in situations where the user wishes to process large quantities of data automatically, more specialized statistical software is better. This book also refers to the open‐source statistical package called ‘R’ which is freely available from http://www.r‐project.org/. In addition to offering a comprehensive collection of well‐documented statistical routines, the R software provides a scripting facility for automation of complex data analysis tasks and can produce publication‐quality graphics.

1.3 Data and measurement error

1.3.1 Types of geographical data: nominal, ordinal, interval, and ratio

Four main types of data are of interest to geographers: nominal, ordinal, interval, and ratio. Nominal data are recorded using categories. For example, if you were to interview a group of people and record their gender, the resulting data would be on a nominal, or categorical, scale. Similarly, if an ecologist were to categorize the plant species found in an area by counting the number of individual plants observed in different categories, the resulting dataset would be categorical, or nominal. The distinguishing property of nominal data is that the categories are simply names – they cannot be ranked relative to each other.

Observations recorded on an ordinal scale can be put into an order relative to one another. For example, a study in which countries are ranked by their popularity as tourist destinations would result in an ordinal dataset. A requirement here is that it is possible to identify whether one observation is larger or smaller than another, based on some measure defined by the analyst.

In contrast with nominal and ordinal scale data, interval scale data are measured on a continuous scale where the differences between different measurements are meaningful. A good example is air temperature, which can be measured to a degree of precision dictated by the quality of the thermometer being used, among other factors. Whilst it is possible to add and subtract interval scale data, they cannot be multiplied or divided. For example, it is correct to say that 30 degrees is 10 degrees hotter than 20 degrees, but it is not correct to say that 200 degrees is twice as hot as 100 degrees. This is because the Celsius temperature scale, like the Fahrenheit scale, has an arbitrarily defined origin.

Ratio scale data are similar to interval scale data but a true zero point is required, and multiplication and division are valid operations when dealing with ratio scale data. Mass is a good example: an adult with a mass of 70 kg is twice as heavy as a child with a mass of 35 kg. Temperature measured on the Kelvin scale, which has an absolute zero point, is also defined as a ratio scale measurement.

It is important from the outset of any investigation to be aware of the different types of geographical data that can be recorded, because some statistical techniques can only be applied to certain types of data. Whilst it is usually possible to convert interval data into ordinal or nominal data (e.g. rainfall values can be ranked or put into categories), it is not possible to make the conversion the other way around.

1.3.2 Spatial data types

Geographers collect data about many different subjects. Some geographical datasets have distinctly spatial components to them. In other words, they contain information about the location of a particular entity, or information about how a particular quantity varies across a region of interest. In many contexts, it is advantageous to collect information on the locations of objects in space, or to record details of the spatial relationships between entities. The two main types of spatial data that can be used are vector data and raster (or gridded) data. Vector data consist of information that is stored as a set of points that are connected to known locations in space (e.g. to represent towns, sampling locations, or places of interest). The points may be connected to form lines (e.g. to represent linear features such as roads, rivers and railways), and the lines may be connected to form polygons (e.g. to represent areas of different land cover, geological units, or administrative units).

The locations of points must be given with reference to a coordinate system which may be rectangular (i.e. given using eastings and northings in linear units such as metres), or spherical (i.e. given using latitudes and longitudes in angular units such as degrees), but which always requires the definition of unit vectors and a fixed point of origin. The most common spherical coordinate system is that of latitude and longitude, which measures points by their angular distance from an origin which is located at the equator (zero latitude) and the Greenwich meridian (zero longitude). Thus the latitude of Buckingham Palace in London, UK, is 0.14°W, 51.50°N indicating that it is 0.14 degrees west of Greenwich and 51.5 degrees north of the equator.

Whilst spherical coordinate systems are commonly used in aviation and marine navigation, and with the arrival of GPS, terrestrial navigation usually uses rectangular coordinate systems. In order to use rectangular coordinates, the spherical form of the Earth must be represented on a flat surface. This is achieved using a map projection. An example of a map projection that is used to obtain a rectangular coordinate system is the Great Britain National Grid, in which locations are defined in metres east and north of a fixed origin that is located to the south west of the Scilly Isles. Thus to give a grid reference for Buckingham Palace as (529125, 179725) is to say that it lies at a point which is 529.150 km east of the origin and 179.750 km north of the origin.

To reduce the amount of information that must be transmitted in practical situations, grid references are typically given relative to a set of predefined 100 km squares. In situations where quoting distances to the nearest metre is not justified they are usually rounded to a more suitable level of precision. The grid reference above, for Buckingham Palace, might be rounded to the nearest 100 m and associated with the box TQ [which has its origin at (500000, 100000)] to give TQ 291 797, where two letters indicate the grid square, the first three digits indicate the easting and the last three digits indicate the northing.

Raster data are provided on a grid, where each grid square contains a number that represents the value of the data within that grid square. Almost any kind of data can be represented using a raster. Examples of data that are collected in raster format include many types of satellite image, and other datasets that are sampled at regular intervals (see Section 2.1.3). The technical process of specifying the location of the raster in space is identical to the process used to locate a point, described above. It is also necessary to specify the resolution of the raster (i.e. the spacing between grid points and the extent or size of the domain).

1.3.3 Measurement error, accuracy and precision

All measurements are subject to uncertainties. As an example, consider a geographer wishing to measure the velocity of a river. One way to do this is to use a stopwatch to measure the time it takes a float to travel a known distance. What are the uncertainties involved in this procedure? One source of error is the reaction time of the person using the stopwatch: they might be slow starting the watch, or fast stopping the watch, or vice versa. Since each possibility is equally likely, this kind of error is termed random error. One way to measure the amount of random error in a measurement is to repeat the procedure many times: sometimes the time will be underestimated, other times we will overestimate the time. By analysing the variability or spread in our results, we can get a good estimate of the amount of random error in our observation. If the spread is small, we say that our measurement is precise; if the spread is large, our measurement is less precise. The term precision is used to describe the degree to which repeated observations of the same quantity are in agreement with each other.

What if the stopwatch was consistently slow? In this case, all of the times measured would be shorter than they ought to be and no amount of repetition would be able to detect this source of error. Such errors are referred to as systematic errors, because we consistently underestimate the time taken if the stopwatch is slow, and consistently overestimate the time taken if the stopwatch is fast. If the amount of systematic error is low, we refer to our measurements as accurate; if the amount of systematic error is high, our measurement is less accurate. The term accuracy is used to describe the degree to which a measured value of a quantity matches its true value. Statistical analysis offers few opportunities to detect systematic errors, because we do not usually know the true value of the measurement that is being made: it is up to the person measuring the data to reduce the amount of systematic error through careful design of field, lab, or survey procedures.

A typical graphical analogy used to illustrate the difference between accuracy and precision involves a set of archery targets (Figure 1.1). Here, the archer is subject to random errors due to the wind or the steadiness of their hand; and potential systematic errors due to the design of the bow and arrow and its sight. Note the important point that it is impossible to assess the precision of a single measurement using statistical techniques.

Figure 1.1 Accuracy and precision in archery. (a) High accuracy with high precision; (b) high accuracy with low precision; (c) low accuracy but high precision; (d) low accuracy and low precision. Note that without knowing the location of the archery target (i.e. the true value of the measured quantity), cases (c) and (d) are indistinguishable from (a) and (b), respectively.

1.3.4 Reporting data and uncertainties

The most straightforward way to communicate error is to give the best estimate of the final answer and the range within which you are confident that the measurement falls. Taking the earlier example of measuring the velocity of a river, suppose that we measure the velocity several times, giving the following estimates (in metres per second, or m/s):