88,99 €
Forecast Verification: A Practioner's Guide in Atmospheric Science, 2nd Edition provides an indispensible guide to this area of active research by combining depth of information with a range of topics to appeal both to professional practitioners and researchers and postgraduates. The editors have succeeded in presenting chapters by a variety of the leading experts in the field while still retaining a cohesive and highly accessible style. The book balances explanations of concepts with clear and useful discussion of the main application areas. Reviews of first edition: "This book will provide a good reference, and I recommend it especially for developers and evaluators of statistical forecast systems." (Bulletin of the American Meteorological Society; April 2004) "...a good mixture of theory and practical applications...well organized and clearly written..." (Royal Statistical Society, Vol.168, No.1, January 2005) NEW to the second edition: * Completely updated chapter on the Verification of Spatial Forecasts taking account of the wealth of new research in the area * New separate chapters on Probability Forecasts and Ensemble Forecasts * Includes new chapter on Forecasts of Extreme Events and Warnings * Includes new chapter on Seasonal and Climate Forecasts * Includes new Appendix on Verification Software Cover image credit: The triangle of barplots shows a novel use of colour for visualizing probability forecasts of ternary categories - see Fig 6b of Jupp et al. 2011, On the visualisation, verification and recalibration of ternary probabilistic forecasts, Phil. Trans. Roy. Soc. (in press).
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 695
Veröffentlichungsjahr: 2012
Contents
Cover
Title Page
Copyright
List of contributors
Preface
Preface to the first edition
1: Introduction
1.1 A brief history and current practice
1.2 Reasons for forecast verification and its benefits
1.3 Types of forecast and verification data
1.4 Scores, skill and value
1.5 Data quality and other practical considerations
1.6 Summary
2: Basic concepts
2.1 Introduction
2.2 Types of predictand
2.3 Exploratory methods
2.4 Numerical descriptive measures
2.5 Probability, random variables and expectations
2.6 Joint, marginal and conditional distributions
2.7 Accuracy, association and skill
2.8 Properties of verification measures
2.9 Verification as a regression problem
2.10 The Murphy–Winkler framework
2.11 Dimensionality of the verification problem
3: Deterministic forecasts of binary events
3.1 Introduction
3.2 Theoretical considerations
3.3 Signal detection theory and the ROC
3.4 Metaverification: criteria for assessing performance measures
3.5 Performance measures
Acknowledgements
4: Deterministic forecasts of multi-category events
4.1 Introduction
4.2 The contingency table: notation, definitions, and measures of accuracy
4.3 Skill scores
4.4 Sampling variability of the contingency table and skill scores
5: Deterministic forecasts of continuous variables
5.1 Introduction
5.2 Forecast examples
5.3 First-order moments
5.4 Second- and higher-order moments
5.5 Scores based on cumulative frequency
5.6 Summary and concluding remarks
6: Forecasts of spatial fields
6.1 Introduction
6.2 Matching methods
6.3 Traditional verification methods
6.4 Motivation for alternative approaches
6.5 Neighbourhood methods
6.6 Scale separation methods
6.7 Feature-based methods
6.8 Field deformation methods
6.9 Comparison of approaches
6.10 New approaches and applications: the future
6.11 Summary
7: Probability forecasts
7.1 Introduction
7.2 Probability theory
7.3 Probabilistic scoring rules
7.4 The relative operating characteristic (ROC)
7.5 Evaluation of probabilistic forecasting systems from data
7.6 Testing reliability
Acknowledgements
8: Ensemble forecasts
8.1 Introduction
8.2 Example data
8.3 Ensembles interpreted as discrete samples
8.4 Ensembles interpreted as probabilistic forecasts
8.5 Summary
Acknowledgement
9: Economic value and skill
9.1 Introduction
9.2 The cost/loss ratio decision model
9.3 The relationship between value and the ROC
9.4 Overall value and the Brier Skill Score
9.5 Skill, value and ensemble size
9.6 Applications: value and forecast users
9.7 Summary
10: Deterministic forecasts of extreme events and warnings
10.1 Introduction
10.2 Forecasts of extreme events
10.3 Warnings
Acknowledgements
11: Seasonal and longer-range forecasts
11.1 Introduction
11.2 Forecast formats
11.3 Measuring attributes of forecast quality
11.4 Measuring the quality of individual forecasts
11.5 Decadal and longer-range forecast verification
11.6 Summary
12: Epilogue: new directions in forecast verification
12.1 Introduction
12.2 Review of key concepts
12.3 Forecast evaluation in other disciplines
12.4 Current research and future directions
Acknowledgements
Appendix: Verification software
A.1 What is good software?
A.2 Types of verification users
A.3 Types of software and programming languages
A.4 Institutional supported software
A.5 Displays of verification information
Glossary
References
Plates
Index
This edition first published 2012 © 2012 by John Wiley & Sons, Ltd.
Wiley-Blackwell is an imprint of John Wiley & Sons, formed by the merger of Wiley's global Scientific, Technical and Medical business with Blackwell Publishing.
Registered office John Wiley & Sons, Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK
Editorial offices 9600 Garsington Road, Oxford, OX4 2DQ, UK The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK 111 River Street, Hoboken, NJ 07030-5774, USA
For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com/wiley-blackwell.
The right of the author to be identified as the author of this work has been asserted in accordance with the UK Copyright, Designs and Patents Act 1988.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.
Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought.
Library of Congress Cataloging-in-Publication Data
Forecast verification : a practitioner's guide in atmospheric science / edited by Ian T. Jolliffe and David B. Stephenson. – 2nd ed. p. cm. Includes index. ISBN 978-0-470-66071-3 (cloth) 1. Weather forecasting--Statistical methods--Evaluation. I. Jolliffe, I. T. II. Stephenson, David B. QC996.5.F67 2011 551.63–dc23
2011035808
A catalogue record for this book is available from the British Library.
This book is published in the following electronic formats: ePDF 9781119960010; Wiley Online Library 9781119960003; ePub 9781119961079; Mobi 9781119961086
List of contributors
Dr Jochen Broecker Max-Planck-Institute for the Physics of Complex Systems, Noethnitzer Str. 38, 01187 Dresden, [email protected]
Dr Barbara G. Brown Research Applications Laboratory, National Center for Atmospheric Research, P.O. Box 3000, Boulder CO 80307-3000, [email protected]
Michel Déqué Météo-France CNRM,CNRS/GAME, 42 Avenue Coriolis, 31057 Toulouse Cedex 01, [email protected]
Dr Elizabeth E. Ebert Centre for Australian Weather and Climate Research (CAWCR), Bureau of Meteorology, GPO Box 1289, Melbourne, Victoria 3001, [email protected]
Dr Christopher A.T. Ferro Mathematics Research Institute, College of Engineering, Mathematics and Physical Sciences, University of Exeter, Harrison Building, North Park Road, Exeter EX4 4QF, [email protected]
Dr Eric Gilleland Research Applications Laboratory, National Center for Atmospheric Research, P.O. Box 3000, Boulder CO 80307-3000, [email protected]
Professor Robin Hogan Department of Meteorology, University of Reading, P.O. Box 243, Reading RG6 6BB, [email protected]
Professor Ian Jolliffe 30 Woodvale Road, Gurnard, Cowes, Isle of Wight, PO31 8EG, [email protected]
Dr Robert E. Livezey 5112 Lawton Drive, Bethesda, MD 20816, [email protected]
Dr Ian B. Mason 32 Hensman St., Latham, ACT, Australia, [email protected]
Dr Simon J. Mason International Research Institute for Climate and Society (IRI), Columbia University, 61 Route 9W, P.O. Box 1000, Palisades, NY 10964-8000, [email protected]
Dr Matt Pocernich Research Applications Laboratory, National Center for Atmospheric Research, P.O. Box 3000, Boulder CO 80307-3000, USA e-mail: [email protected]
Dr Jacqueline M. Potts Biomathematics and Statistics Scotland, Craigiebuckler, Aberdeen AB15 8QH, [email protected]
David S. Richardson European Centre for Medium-Range Weather Forecasts (ECMWF), Shinfield Park, Reading, RG2 9AX, [email protected]
Professor David B. Stephenson Mathematics Research Institute, College of Engineering, Mathematics and Physical Sciences, University of Exeter, Harrison Building, North Park Road, Exeter EX4 4QF, [email protected]
Dr Andreas P. Weigel Federal Office of Meteorology and Climatology MeteoSwiss, Kraehbuehlstr. 58, P.O. Box 514, CH-8044 Zurich, [email protected]
Preface
In the eight years since the first edition was published, there has been considerable expansion of the literature on forecast verification, and the time is ripe for a new edition. This second edition has three more chapters than the first, as well as a new Appendix and substantially more references. Developments in forecast verification have not been confined to the atmospheric science literature but, as with the first edition, we concentrate mainly on this area.
As far as we are aware, there is still no other book that gives a comparable coverage of forecast verification, although at least two related books have appeared outside the atmospheric science area. Pepe (2003) is concerned with evaluation of medical diagnostic tests, which, although essentially concerned with ‘forecast verification’, has a very different emphasis, whilst Krzanowski and Hand (2009) is more narrowly focused on ROC curves.
We have retained many of the authors from the first edition, as well as bringing in a number of other experts, mainly for the new chapters. All are well-regarded researchers and practitioners in their fields. Shortly after the first edition was published, an extended and constructive review appeared (Glahn, 2004; Jolliffe and Stephenson, 2005). In this new edition we and our authors have attempted to address some of the issues raised by Glahn.
Compared with the first edition, the introductory and scene-setting Chapters 1 and 2 have only minor changes. Chapter 3 on ‘Deterministic forecasts of binary events’ has gained an additional author and has been rewritten. Much material from the first edition has been retained but has been restructured, and a non-trivial amount of new material, reflecting recent developments, has been added. Chapters 4 and 5 on, respectively, ‘Deterministic forecasts of multi-category events’ and ‘Deterministic forecasts of continuous variables’ have only minor improvements.
One of the biggest areas of development in forecast verification in recent years has been for spatial forecasts. This reflected by a much-expanded Chapter 6 on the topic, with three new authors, all of whom are leaders in the field.
In the first edition, probability forecasts and ensemble forecasts shared a chapter. This is another area of active development and, as suggested by Glahn (2004) and others, the two topics have been separated into Chapters 7 and 8 respectively, with two new authors. Chapter 9 on ‘Economic value and skill’ has only minor changes compared to the first edition.
Chapters 10 and 11 are both new, covering areas that have seen much recent research and are likely to continue to do so. Chapter 10 covers the related topics of verification of forecasts for rare and extreme events, and verification of weather warnings. By their nature the latter are often extreme, though many types of warnings are issued for events that are not especially rare. Impact rather than rarity is what warrants a warning. One context in which extremes are of particular interest is that of climate change. Because of the lack of verifying observations, the topic of verification of climate projections is still in its infancy, though likely to develop. There has been more activity on verification of seasonal and decadal forecasts, and these together with verification of climate projections, are the subject of Chapter 11.
The concluding Chapter 12 reviews some key concepts, summarizes some of the verification/evaluation activity in disciplines other than atmospheric sciences, and discusses some of the main developments since the first edition. As with the first edition, a Glossary is provided, and in addition there is an Appendix on available software. Although such an Appendix inevitably becomes out of date more quickly than other parts of the text, it is arguably the most useful part of the book to practitioners for the first few years after publication. To supplement the Appendix, software and data sets used in the book will be provided via our book website: http://emps.exeter.ac.uk/fvb. We also intend to use this website to record errata and suggestions for future additions.
We hope you enjoy this second edition and find it useful. If you have any comments or suggestions for future editions, we would be happy to hear from you.
Ian T. Jolliffe David B. Stephenson
Preface to the first edition
Forecasts are made in many disciplines, the best known of which are economic forecasts and weather forecasts. Other situations include medical diagnostic tests, prediction of the size of an oil field, and any sporting occasion where bets are placed on the outcome. It is very often useful to have some measure of the skill or value of a forecast or forecasting procedure. Definitions of ‘skill’ and ‘value’ will be deferred until later in the book, but in some circumstances financial considerations are important (economic forecasting, betting, oil field size), whilst in others a correct or incorrect forecast (medical diagnosis, extreme weather events) can mean the difference between life and death.
Often the ‘skill’ or ‘value’ of a forecast is judged in relative terms. Is forecast provider A doing better than B? Is a newly developed forecasting procedure an improvement on current practice? Sometimes, however, there is a desire to measure absolute, rather than relative, skill. Forecast verification, the subject of this book, is concerned with judging how good is a forecasting system or single forecast.
Although the phrase ‘forecast verification’ is generally used in atmospheric science, and hence adopted here, it is rarely used outside the discipline. For example, a survey of keywords from articles in the International Journal of Forecasting between 1996 and 2002 has no instances of ‘verification’. This journal attracts authors from a variety of disciplines, though economic forecasting is prominent. The most frequent alternative terminology in the journal's keywords is ‘forecast evaluation’, although validation and accuracy also occur. Evaluation and validation also occur in other subject areas, but the latter is often used to denote a wider range of activities than simply judging skill or value – see, for example, Altman and Royston (2000).
Many disciplines make use of forecast verification, but it is probably fair to say that a large proportion of the ideas and methodology have been developed in the context of weather and climate forecasting, and this book is firmly rooted in that area. It will therefore be of greatest interest to forecasters, researchers and students in atmospheric science. It is written at a level that is accessible to students and to operational forecasters, but it also contains coverage of recent developments in the area. The authors of each chapter are experts in their fields and are well aware of the needs and constraints of operational forecasting, as well as being involved in research into new and improved methods of verification. The audience for the book is not restricted to atmospheric scientists – there is discussion in several chapters of similar ideas in other disciplines. For example ROC curves (Chapter 3) are widely used in medical applications, and the ideas of Chapter 8 are particularly relevant to finance and economics.
To our knowledge there is currently no other book that gives a comprehensive and up-to-date coverage of forecast verification. For many years, The WMO publication by Stanski et al. (1989) and its earlier versions was the standard reference for atmospheric scientists, though largely unknown in other disciplines. Its drawback is that it is somewhat limited in scope and is now rather out-of-date. Wilks (2006b [formerly 1995], Chapter 7) and von Storch and Zweirs (1999, Chapter 18) are more recent but, inevitably as each comprises only one chapter in a book, are far from comprehensive. The current book provides a broad coverage, although it does not attempt to be encyclopedic, leaving the reader to look in the references for more technical material.
Chapters 1 and 2 of the book are both introductory. Chapter 1 gives a brief review of the history and current practice in forecast verification, gives some definitions of basic concepts such as skill and value, and discusses the benefits and practical considerations associated with forecast verification. Chapter 2 describes a number of informal descriptive ways, both graphical and numerical, of comparing forecasts and corresponding observed data. It then establishes some theoretical groundwork that is used in later chapters, by defining and discussing the joint probability distribution of the forecasts and observed data. Consideration of this joint distribution and its decomposition into conditional and marginal distributions leads to a number of fundamental properties of forecasts. These are defined, as are the ideas of accuracy, association and skill.
Both Chapters 1 and 2 discuss the different types of data that may be forecast, and each of the next five chapters then concentrates on just one type. The subject of Chapter 3 is binary data in which the variable to be forecast has only two values, for example {Rain, No Rain}, {Frost, No Frost}. Although this is apparently the simplest type of forecast, there have been many suggestions of how to assess them, in particular many different verification measures have been proposed. These are fully discussed, along with their properties. One particularly promising approach is based on signal detection theory and the ROC curve.
For binary data one of two categories is forecast. Chapter 4 deals with the case in which the data are again categorical, but where there are more than two categories. A number of skill scores for such data are described, their properties are discussed, and recommendations are made.
Chapter 5 is concerned with forecasts of continuous variables such as temperature. Mean squared error and correlation are the best-known verification measures for such variables, but other measures are also discussed including some based on comparing probability distributions.
Atmospheric data often consist of spatial fields of some meteorological variable observed across some geographical region. Chapter 6 deals with verification for such spatial data. Many of the verification measures described in Chapter 5 are also used in the spatial context, but the correlation due to spatial proximity causes complications. Some of these complications, together with some verification measures that have been developed with spatial correlation in mind, are discussed in Chapter 6.
Probability plays a key role in Chapter 7, which covers two topics. The first is forecasts that are actually probabilities. For example, instead of a deterministic forecast of ‘Rain’ or ‘No Rain’, the event ‘Rain’ may be forecast to occur with probability 0.2. One way in which such probabilities can be produced is to generate an ensemble of forecasts, rather than a single forecast. The continuing increase of computing power has made larger ensembles of forecasts feasible, and ensembles of weather and climate forecasts are now routinely produced. Both ensemble and probability forecasts have their own peculiarities that necessitate different, but linked, approaches to verification. Chapter 7 describes these approaches.
The discussion of verification for different types of data in Chapters 3–7 is largely in terms of mathematical and statistical properties, albeit properties that are defined with important practical considerations in mind. There is little mention of cost or value – this is the topic of Chapter 8. Much of the chapter is concerned with the simple cost-loss model, which is relevant for binary forecasts. However, these forecasts may be either deterministic as in Chapter 3, or probabilistic as in Chapter 7. Chapter 8 explains some of the interesting relationships between economic value and skill scores.
The final chapter (9) reviews some of the key concepts that arise elsewhere in the book. It also summarises the aspects of forecast verification that have received most attention in other disciplines, including Statistics, Finance and Economics, Medicine, and areas of Environmental and Earth Science other than Meteorology and Climatology. Finally, the chapter discusses some of the most important topics in the field that are the subject of current research or that would benefit from future research.
This book has benefited from discussions and help from many people. In particular we would like to thank the following colleagues for their particularly helpful comments and contributions: Barbara Casati, Martin Goeber, Mike Harrison, Rick Katz, Simon Mason, Buruhani Nyenzi and Dan Wilks. Some of the earlier work on this book was carried out while one us (I.T.J.) was on research leave at the Bureau of Meteorology Research Centre (BMRC) in Melbourne. He is grateful to BMRC and its staff, especially Neville Nicholls, for the supportive environment and useful discussions; to the Leverhulme Trust for funding the visit under a Study Abroad Fellowship; and to the University of Aberdeen for granting the leave.
Looking to the future, we would be delighted to receive any feedback comments from you, the reader, concerning material in this book, in order that improvements can be made in future editions (see www.met.rdg.ac.uk/cag/forecasting).
2
Basic concepts
Jacqueline M. Potts
Biomathematics and Statistics Scotland, Aberdeen, UK
2.1 Introduction
Forecast verification involves exploring and summarizing the relationship between sets of forecast and observed data and making comparisons between the performance of forecasting systems and that of reference forecasts. Verification is therefore a statistical problem. This chapter introduces some of the basic statistical concepts and definitions that will be used in later chapters. Further details about the use of statistical methods in the atmospheric sciences can be found in Wilks (2006b) and von Storch and Zwiers (1999).
2.2 Types of predictand
The variable for which the forecasts are formulated is known as the predictand. A continuous predictand is one for which, within the limits over which the variable ranges, any value is possible. This means that between any two different values there are an infinite number of possible values. For discrete variables, however, we can list all possible values. Variables such as pressure, temperature or rainfall are theoretically continuous. In reality, however, such variables are actually discrete because measuring devices have limited reading accuracy and variables are usually recorded to a fixed number of decimal places. Verification of continuous predictands is considered in Chapter 5. Categorical predictands are discrete variables that can only take one of a finite set of predefined values. If the categories provide a ranking of the data, the variable is ordinal; for example, cloud cover is often measured in oktas. On the other hand, cloud type is a nominal variable since there is no natural ordering of the categories. The simplest kind of categorical variable is a binary variable, which has only two possible values, indicating, for example, the presence or absence of some condition such as rain, fog or thunder. Verification of binary forecasts is discussed in Chapter 3, and forecasts in more than two categories are considered in Chapter 4.
Forecasts may be deterministic (e.g. rain tomorrow) or probabilistic (e.g. 70% chance of rain tomorrow). There is more than one way in which a probability forecast may be interpreted. The frequentist interpretation of 70% chance of rain tomorrow is that rain occurs on 70% of the occasions when this forecast is issued. However, such forecasts are usually interpreted in a subjective way as expressing the forecaster's degree of belief that the event will occur (Epstein, 1966). Probability forecasts are often issued for categorical predictands with two or more categories. In the case of continuous predictands a forecast probability density function (see Section 2.5) may be produced, for example based on ensembles (see Chapter 8). Probability forecasts are discussed in greater detail in Chapter 7.
Forecasts are made at different temporal and spatial scales. A very short-range forecast may cover the next 12 hours, whereas long-range forecasts may be issued from 30 days to 2 years ahead and be forecasts of the mean value of a variable over a month or an entire season. Climate change predictions are made at decadal and longer timescales. The verification of seasonal and decadal forecasts is considered in Chapter 11. Prediction models often produce forecasts of spatial fields, usually defined by values of a variable at many points on a regular grid. These vary both in their geographical extent and in the distance between grid points within that area. Forecasts of spatial fields are considered in Chapter 6.
Meteorological data are autocorrelated in both space and time. At a given location, the correlation between observations a day apart will usually be greater than that between observations separated by longer time intervals. Similarly, at a given time, the correlation between observations at grid points that are close together will generally be greater than between those that are further apart, although teleconnection patterns such as the North Atlantic Oscillation can lead to correlation between weather patterns in areas that are separated by vast distances.
Both temporal and spatial autocorrelation have implications for forecast verification. Temporal autocorrelation means that for some types of short-range forecast, persistence often performs well when compared to a forecast of the climatological average. A specific user may be interested only in the quality of forecasts at a particular site, but meteorologists are often interested in evaluating the forecasting system in terms of its ability to predict the whole spatial field. The degree of spatial autocorrelation will affect the statistical distribution of the performance measures used. When spatial autocorrelation is present in both the observed and forecast fields it is likely that, if a forecast is fairly accurate at one grid point, it will also be fairly accurate at neighbouring grid points. Similarly, it is likely that if the forecast is not very accurate at one grid point, it will also not be very accurate at neighbouring grid points. Consequently, the significance of a particular value of a performance measure calculated over a spatial field will be quite different from its significance if it was calculated over the same number of independent forecasts.
2.3 Exploratory methods
Exploratory methods should be used to examine the forecast and observed data graphically; further information about these techniques can be found in Tukey (1977); see also Wilks (2006b, Chapter 3). For continuous variables boxplots (Figure 2.1) provide a means of examining the location, spread and skewness of the forecasts and the observations. The box covers the interquartile range (the central 50% of the data) and the line across the centre of the box marks the median (the central observation). The whiskers attached to the box show the range of the data, from minimum to maximum. Boxplots are especially useful when several of them are placed side by side for comparison.
Figure 2.1 Boxplots of 12–24-h forecasts of high-temperature (°C) for Oklahoma City from three forecasting systems and the corresponding observations
Figure 2.1 shows boxplots of high-temperature forecasts for Oklahoma City made by the National Weather Service Forecast Office at Norman, Oklahoma. Outputs from three different forecasting systems are shown, together with the corresponding observations. These data were used in Brooks and Doswell (1996) and a full description of the forecasting systems can be found in that paper. In Figure 2.1
