Statistics in Environmental Sciences - Valerie David - E-Book

Statistics in Environmental Sciences E-Book

Valerie David

0,0
139,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Statistical tools are indispensable for the environmental sciences. They have become an integral part of the scientific process, from the development of the sampling plan to the obtainment of results. Statistics in Environmental Sciences provides the foundation for the interpretation of quantitative data (basic vocabulary, main laws of probabilities, etc.) and the thinking behind sampling and experimental methodology. It also introduces the principles of statistical tests such as decision theory and examines the key choices in statistical tests, while keeping the established objectives in mind. The book examines the most used statistics in the field of environmental sciences. Detailed descriptions based on concrete examples are given, as well as descriptions obtained through the use of the free software R (whose usage is also presented).

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 366

Veröffentlichungsjahr: 2019

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

Preface

Introduction

I.1. What is the relevance of statistical analysis in environmental sciences?

I.2. The statistical mind: from the representative sample of the population

I.3. The “statistical” tool in environmental research

I.4. Book structure

1 Working with the R Software

1.1. Working with the R software

1.2. Basic operations for statistics in R

1.3. A few graphs to summarize the data set

2 Fundamental Concepts in Statistics

2.1. Basic statistical vocabulary

2.2. Summarizing a sample

2.3. The laws of probability

3 Developing a Sampling or Experimental Plan

3.1. Sampling plans

3.2. Experimental plans

4 Principle of a Statistical Test

4.1. The usefulness of statistics

4.2. Decision theory

4.3. The statistical approach

4.4. Example of the application of a statistical test

5 Key Choices in Statistical Tests

5.1. How are keys chosen?

5.2. Verification tests of application conditions

6 Comparison Tests of Unilateral and Bilateral Parameters

6.1. Comparisons of numbers and proportions

6.2. Comparisons of means

6.3. Correlation test of two quantitative variables

7 Classical and Generalized Linear Models

7.1. Principle of linear models

7.2. Conditions of application of the model

7.3. Other useful analyses

7.4. Example of the application of different linear models

7.5. Examples of the application of GLMs

8 Non-parametric Alternatives to Linear Models

8.1. Principle of non-parametric tests

8.2. ANOVA alternatives

8.3. Non-parametric ANOVAs based on permutation tests (PERMANOVA)

8.4. Nonlinear models

Conclusion

References

Index

End User License Agreement

List of Tables

Chapter 2

Table 2.1. Calculation of absolute, relative and cumulative frequencies by inter...

Table 2.2. Main laws of probability (other than the law of normal distribution) ...

Chapter 5

Table 5.1. Parameters of interest in the analysis and associated keys

Table 5.2. Key 1

Table 5.3. Key 2

Table 5.4. Key 3

Table 5.5. Key 4

Chapter 7

Table 7.1. Variance analysis table for linear regression and a variance analysis...

Table 7.2. The different generalized linear models compared to the linear model,...

Table 7.3. Variables and associated codes for environmental variables

List of Illustrations

Introduction

Figure I.1. Weight distribution for 1,000 industrially produced Bijou brand made...

Figure I.2. Hypothetical example 1: all oysters in the Arcachon Bay were counted...

Figure I.3. Comparison of two geographically distinct hypothetical oyster popula...

Figure I.4. Comparison of two hypothetical oyster populations that are geographi...

Figure I.5. Precise procedure for a simple random sampling of eight oysters out ...

Figure I.6. Distribution curve of the weights of wild oysters taken from Cap Fer...

Figure I.7. Curve of the weight distribution of wild oysters in Comprian for the...

Figure I.8. Comparison of the scientific approach with criminal investigations. ...

Figure I.9. The synergy of research approaches

Figure I.10. Successive/interconnected approaches through the “choices” made thr...

Chapter 1

Figure 1.1. Search window of the help.search() function

Figure 1.2. Pch symbol for the plot function. For a color version of this figure...

Figure 1.3. A) Length of the frontal lobe as a function of shell length with a p...

Figure 1.4. Histograms for the shell length of crabs: A) the number of classes (...

Figure 1.5. Illustrations of a boxplot from a series of raw data (gray dots). Fo...

Figure 1.6. Illustrations of boxplots: A) for the length of the frontal lobe and...

Figure 1.7. Illustrations of boxplots with two qualitative factors: A) the two f...

Figure 1.8. Barplot illustrations: A) for a single factor; B) for two factors. F...

Figure 1.9. Illustration of representations of relationships of quantitative var...

Figure 1.10. Illustration of representations of relationships of quantitative va...

Figure 1.11. Illustration of the results of the contingency table, constructed f...

Figure 1.12. Representation of the averages of the replicas based on A) the type...

Chapter 2

Figure 2.1. Basic vocabulary illustrated by wild birds in Lake Apopka. For a col...

Figure 2.2. Creation of a database that is easy to use in statistics

Figure 2.3. Statistical ranks versus theoretical ranks. The sum of the theoretic...

Figure 2.4. Method of calculating means, modes, medians and quantiles on a lead ...

Figure 2.5. Histograms of absolute (A), relative (B) and cumulative relative (C)...

Figure 2.6. Histogram and associated density function f(z); the function is obta...

Figure 2.7. Illustration of the significance of the confidence interval drawn fr...

Figure 2.8. Representation of standard normal distribution and its essential mat...

Figure 2.9. Table giving the correlation between the reduced centered data and t...

Figure 2.10. Areas of normal distribution corresponding to oaks that are smaller...

Chapter 3

Figure 3.1. Example of a random draw of 20 stations in the Arcachon Bay. For a c...

Figure 3.2. Positioning of 20 stations distributed according to two random (A an...

Figure 3.3. The 4 most used random samples. For a color version of this figure, ...

Figure 3.4. Process for first and second degree random cluster sampling. For a c...

Figure 3.5. Procedure for random systematic sampling. For a color version of thi...

Figure 3.6. Example of an experimental plan put in place to determine whether th...

Figure 3.7. Sensitivity versus generalization of the experimental plan according...

Figure 3.8. Examples of multi-treatment experimental plans with 1 controlled fac...

Figure 3.9. Involvement of a multi-factorial type of plan concerning the number ...

Figure 3.10. Examples of blocks and Latin square experimental plans

Figure 3.11. Examples of experimental plans in the face of two concrete realitie...

Figure 3.12. Example of experimental plans with crossed and prioritized factors

Figure 3.13. Interaction in a crossover plan: type of response associated with a...

Chapter 4

Figure 4.1. For example, statistical tests make it possible to compare two popul...

Figure 4.2. An illustrated example of decision theory based on the example of Af...

Figure 4.3. Illustrated example of the risks of error in statistics through: A) ...

Figure 4.4. Carrying out 40 test attempts following different sampling of eight ...

Figure 4.5. Concepts of power and robustness of a test, and parametric and non-p...

Figure 4.6. Histograms of the shell lengths of female crabs, distinguishable by ...

Figure 4.7. Student probability density for female crab samples to be compared w...

Figure 4.8. Student probability density for samples of crabs facing the permitte...

Figure 4.9. Histogram of the differences between cadmium concentrations in the l...

Chapter 5

Figure 5.1. Principle of the Shapiro–Wilk normality test. For a color version of...

Figure 5.2. Projection on the probability density of the F

calculated

and the 5% ...

Chapter 6

Figure 6.1. Illustration of the results of the contingency table constructed fro...

Figure 6.2. Illustrations of the chi-squared conformity distribution on the comp...

Figure 6.3. Illustration of the results of the contingency table based on the nu...

Figure 6.4. Illustration of the results of the contingency table, constructed fr...

Figure 6.5. Illustration of the results of three contingency tables (one per col...

Figure 6.6. Box plot and bar chart representation (mean ± standard error) of tre...

Figure 6.7. Box plot and bar chart representation (mean ± standard error) of fem...

Figure 6.8. Box plot representation of the harmful gas levels measured at the we...

Figure 6.9. Box plot representation of the difference in tree sizes before and a...

Figure 6.10. Example of two cases in which the values are arranged by rank befor...

Figure 6.11. Box plot representation of the size of gray cuckoo eggs in warbler ...

Figure 6.12. Representation of hyperfine sedimentation constants (mean ± standar...

Figure 6.13. Different types of relationships between two variables

Figure 6.14. Case of the weight–height relationship of cod; comparison of the di...

Figure 6.15. Establishing the weight–height relationship equation for cod: with ...

Chapter 7

Figure 7.1. Different types of statistical models. For a color version of this f...

Figure 7.2. Different types of classical linear statistical models. For a color ...

Figure 7.3. Illustration of the equality of the total deviation to the sum of ex...

Figure 7.4. Regression plane design linking the explained Y variable to two X

1

a...

Figure 7.5. Decomposition of a partial regression. The regression is not signifi...

Figure 7.6. Graphical representation of the covariance analysis, analyzing the j...

Figure 7.7. Key for choosing the linear model design to be implemented and main ...

Figure 7.8. Differences in methods between the least squares adjustment (A) and ...

Figure 7.9. Analysis of graphs to verify the application conditions of classical...

Figure 7.10. Graphs that can be used to analyze the application conditions of th...

Figure 7.11. Weight representation (in grams) according to height (length in cm)...

Figure 7.12. Maximum likelihood according to lambda in order to find the best la...

Figure 7.13. Box plots representing the lengths of petals according to their spe...

Figure 7.14. Example of a variance partition of the effect of time and humidity ...

Figure 7.15. Graph showing the log(Condition index) according to turnover time. ...

Figure 7.16. Graphs given by R to analyze the application conditions of the log(...

Figure 7.17. Graphs representing the log(Condition index) according to the turno...

Figure 7.18. Simple linear regression of sepal lengths according to petal length...

Figure 7.19. Graph given by R to analyze the application conditions of the multi...

Figure 7.20. Example of a variance partition regarding the effect of turnover ti...

Figure 7.21. Results given by R for the graph of the variance partition regardin...

Figure 7.22. Graph showing production according to nitrogen concentrations; the ...

Figure 7.23. Graph given by R to graphically verify the application conditions i...

Figure 7.24. Observation of the mean lines of the production data according to n...

Figure 7.25. Graph given by R to graphically verify the application conditions i...

Figure 7.26. Observation of bar graphs representing production according to nitr...

Figure 7.27. Observation of the corresponding gross values showing mean lines of...

Figure 7.28. Graph given by R to graphically verify the application conditions o...

Figure 7.29. Box plots representing dispersion by A) forest type and B) stratum

Figure 7.30. Example of a prioritized plan for studying the adaptation of plant ...

Figure 7.31. Observation of the corresponding gross values reporting on dispersi...

Figure 7.32. Graph given by R to graphically verify the application conditions. ...

Figure 7.33. Mean variation in production according to the species, which is rep...

Figure 7.34. Observation of the corresponding gross values reflecting the disper...

Figure 7.35. Graph given by R to graphically verify the application conditions. ...

Figure 7.36. Graphs given by R to verify the application conditions of cockroach...

Figure 7.37. Graphical representation of raw male (M) and female (F) data in tex...

Figure 7.38. Location of the 208 lakes sampled for the study in the northeastern...

Figure 7.39. Graphs given by R to verify the conditions for applying the specifi...

Figure 7.40. Graphs given by R to verify the application conditions of the total...

Figure 7.41. Effect of the presence/absence of the Millsonia worm and nitrogen c...

Figure 7.42. Graphs given by R to verify the application conditions of oat produ...

Chapter 8

Figure 8.1. Principle of the Kruskal–Wallis test regarding the relationship betw...

Figure 8.2. Production according to fertilizer types; superimposed letters corre...

Figure 8.3. Number of insects caught per hectare according to an increasing rate...

Figure 8.4. Principle of a PERMANOVA. For a color version of this figure, see ww...

Figure 8.5. Graphical representation of specific wealth according to salinity; s...

Figure 8.6. Example of the nonlinear relationship between Bidonia and moisture. ...

Guide

Cover

Table of Contents

Begin Reading

Pages

v

iii

iv

ix

x

xi

xii

xiii

xiv

xv

xvi

xvii

xviii

xix

xx

xxi

xxii

xxiii

xxiv

xxv

xxvi

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

273

274

275

276

277

278

279

280

281

283

284

285

Series EditorFrançoise Gaill

Statistics in Environmental Sciences

Valérie David

First published 2019 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.

Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address:

ISTE Ltd 27-37 St George’s Road London SW19 4EU UK USA www.iste.co.uk

John Wiley & Sons, Inc. 111 River Street Hoboken, NJ 07030 www.wiley.com

© ISTE Ltd 2019

The rights of Valérie David to be identified as the author of this work have been asserted by her in accordance with the Copyright, Designs and Patents Act 1988.

Library of Congress Control Number: 2019940675

British Library Cataloguing-in-Publication Data

A CIP record for this book is available from the British Library

ISBN 978-1-78630-507-7

Preface Statistics: Essential Tools to be Carefully Considered

“Don’t interpret what you don’t understand...

call for an understanding of the methods, the supporters

and outcomes before making sense of

the conclusions”.

Nicolas Gauvrit, Statistiques, méfiez-vous ! (2014)

In the 19th Century, the British Prime Minister, Benjamin Disraeli, defined three kinds of lies: “lies, damned lies and statistics”, already underlining the controversy surrounding these tools. Within the scientific community, pro-statistics and anti-statistics are in conflict and this is manifested by disciplines that are still reluctant to use them. Many media use statistics to sedate public opinion and advance results that ultimately make no sense without some prior clarification.

More recently, in the article published in Le Monde on October 4, 2017, “Publier ou périr : la Science poussée à la faute” (“Publish or perish: Science pushed to a fault”), journalists highlight the misuse of statistics as one of research’s bad practices in the race to publish. These misuses are related to a poor understanding of how this tool works with biased sampling, simplistic experiments limiting reproducibility, exaggerated results in terms of the statistical population used or their significance, or an underestimation of the risks of error associated with statistical tests.

Despite these criticisms, it is clear that once these tools begin to be used within a scientific discipline, they quickly become essential. The use of statistics is the only way to generalize sample results to the population level due to problems related to sampling fluctuations and the inherent variability of “natural” objects. However, it can undoubtedly lead to biased results if they are not carried out accurately.

Statistics require a kind of “calibration”. It would not be appropriate for a biogeochemist to use an oxygen probe without first calibrating it according to environmental parameters such as temperature, or for a system ecologist to identify species without using rigorous and expertly recognized determination keys. The “calibration” of statistical tools is done through verifying the conditions under which the tests used to meet a specific objective are applied (e.g. comparison of population averages, existence of trends, etc.). These tests are based on the use of mathematical equations according to certain hypotheses. Failure to respect these hypotheses makes the application of the equations used in the test flawed due to the mathematical properties used for its design.

Thus, these tools are essential for an objective scientific approach, but their use requires particular accuracy in their implementation and interpretation. The objective of this book is therefore to understand the use of statistics by explaining the spirit behind their design, and to present the most commonly used analyses in environmental sciences, their principles, their advantages and disadvantages, their implementation via the R software and their interpretations in the field of environmental sciences.

Valérie David

May 2019

1Working with the R Software1

1.1. Working with the R software

1.1.1. Why and how to work with the R software?

The R software is both a computer programming language and a working environment for statistical data processing. It is used to manipulate data, plot graphs and perform statistical analyses on these data. Among its advantages, R is:

– a

“free” software

that can be downloaded and installed on personal computers;

– a “

cross-platform

” software that runs on Windows, Linux, Mac OS, etc.

As a result, many scientists use it internationally and do not hesitate to share their statistical knowledge by developing new functions and communicating through forums. Everyone can contribute to its improvement by integrating new functionalities or analysis methods that have not yet been implemented.

This therefore makes it a software that is in rapid and constant evolution. Thus, many statistical analyses that are both simple and complex are available (descriptive and inferential statistics, parametric or non-parametric tests, linear or nonlinear models, multivariate analyses, etc.). No commercial