Monetizing Data - Andrea Ahlemeyer-Stubbe - E-Book

Monetizing Data E-Book

Andrea Ahlemeyer-Stubbe

0,0
58,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Practical guide for deriving insight and commercial gain from data 

Monetising Data offers a practical guide for anyone working with commercial data but lacking deep knowledge of statistics or data mining. The authors — noted experts in the field — show how to generate extra benefit from data already collected and how to use it to solve business problems.  In accessible terms, the book details ways to extract data to enhance business practices and offers information on important topics such as data handling and management, statistical methods, graphics and business issues. The text presents a wide range of illustrative case studies and examples to demonstrate how to adapt the ideas towards monetisation, no matter the size or type of organisation.

The authors explain on a general level how data is cleaned and matched between data sets and how we learn from data analytics to address vital business issues. The book clearly shows how to analyse and organise data to identify people and follow and interact with them through the customer lifecycle. Monetising Data is an important resource:

  • Focuses on different business scenarios and opportunities to turn data into value
  • Gives an overview on how to store, manage and maintain data
  • Presents mechanisms for using knowledge from data analytics to improve the business and increase profits
  • Includes practical suggestions for identifying business issues from the data

Written for everyone engaged in improving the performance of a company, including managers and students, Monetising Data is an essential guide for understanding and using data to enrich business practice.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 622

Veröffentlichungsjahr: 2018

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

Title Page

About the Authors

List of Figures

List of Tables

Preface

1 The Opportunity

1.1 Introduction

1.2 The Rise of Data

1.3 Realising Data as an Opportunity

1.4 Our Definition of Monetising Data

1.5 Guidance on the Rest of the Book

2 About Data and Data Science

2.1 Introduction

2.2 Internal and External Sources of Data

2.3 Scales of Measurement and Types of Data

2.4 Data Dimensions

2.5 Quality of Data

2.6 Importance of Information

2.7 Experiments Yielding Data

2.8 A Data‐readiness Scale for Companies

2.9 Data Science

2.10 Data Improvement Cycle

3 Big Data Handling, Storage and Solutions

3.1 Introduction

3.2 Big Data, Smart Data…

3.3 Big Data Solutions

3.4 Operational Systems supporting Business Processes

3.5 Analysis‐based Information Systems

3.6 Structured Data – Data Warehouses

3.7 Poly‐structured (Unstructured) Data – NoSQL Technologies

3.8 Data Structures and Latency

3.9 Data Marts

4 Data Mining as a Key Technique for Monetisation

4.1 Introduction

4.2 Population and Sample

4.3 Supervised and Unsupervised Methods

4.4 Knowledge‐discovery Techniques

4.5 Theory of Modelling

4.6 The Data Mining Process

5 Background and Supporting Statistical Techniques

5.1 Introduction

5.2 Variables

5.3 Key Performance Indicators

5.4 Taming the Data

5.5 Data Visualisation and Exploration of Data

5.6 Basic Statistics

5.7 Feature Selection and Reduction of Variables

5.8 Sampling

5.9 Statistical Methods for Proving Model Quality and Generalisability and Tuning Models

6 Data Analytics Methods for Monetisation

6.1 Introduction

6.2 Predictive Modelling Techniques

6.3 Pattern Detection Methods

6.4 Methods in practice

7 Monetisation of Data and Business Issues: Overview

7.1 Introduction

7.2 General Strategic Opportunities

7.3 Data as a Donation

7.4 Data as a Resource

7.5 Data Leading to New Business Opportunities

7.6 Information Brokering using Data

7.7 Connectivity as a Strategic Opportunity

7.8 Problem‐solving Methodology

8 How to Create Profit Out of Data

8.1 Introduction

8.2 Business Models for Monetising Data

8.3 Data Product Design

8.4 Value of Data

8.5 Charging Mechanisms

8.6 Connectivity as an Opportunity for Streamlining a Business

9 Some Practicalities of Monetising Data

9.1 Introduction

9.2 Practicalities

9.3 Special focus on SMEs

9.4 Special Focus on B2B Lead Generation

9.5 Legal and Ethical Issues

9.6 Payments

9.7 Innovation

10 Case Studies

10.1 Job Scheduling in Utilities

10.2 Shipping

10.3 Online Sales or Mail Order

10.4 Intelligent Profiling with Loyalty Card Schemes

10.5 Social Media: a Mechanism to Collect and Use Contributor Data

10.6 Making a Business out of Boring Statistics

10.7 Social Media and Web Intelligence Services

10.8 Service Provider

10.9 Data Source

10.10 Industry 4.0: Metamodelling using Simulated Data

10.11 Industry 4.0: Modelling Pricing Data in Manufacturing

10.12 Monetising Data in an SME

10.13 Making Sense of Public Finance and Other Data

10.14 Benchmarking who is the Best in the Market

10.15 Change of Shopping Habits Part I

10.16 Change of Shopping Habits Part II

10.17 Change of Shopping Habits Part III

10.18 Service Providers, Households and Facility Management

10.19 Insurance, Healthcare and Risk Management

10.20 Mobility and Connected Cars

10.21 Production and Automation in Industry 4.0

Bibliography

Glossary

Index

End User License Agreement

List of Tables

Chapter 02

Table 2.1 Typical internal and external data in information systems.

Table 2.2 Extract of sales data.

Table 2.3 Company sales data analytics.

Table 2.4 Internal sales data enriched with external data.

Table 2.5 Scales of measurement examples.

Table 2.6 Checklist for data readiness.

Chapter 04

Table 4.1 Confusion matrix for comparing models.

Chapter 05

Table 5.1 Partially tamed data.

Table 5.2 Outcomes of a hypothesis test.

Table 5.3 Typical significance borders.

Table 5.4 Examples of statistical tests.

Table 5.5 Example of a contingency table.

Table 5.6 Target proportions.

Table 5.7 Confusion matrix.

Table 5.8 Gains chart.

Table 5.9 Non‐cumulative lift and gains table.

Chapter 06

Table 6.1 Example of a contingency table.

Table 6.2 Analysis table for goodness of fit.

Chapter 08

Table 8.1 Business models for types of exchange.

Table 8.2 Business models for B2C selling.

Table 8.3 Business models for service providers.

Chapter 09

Table 9.1 Business model canvas of the comparisons between data brokers and insight innovators.

Chapter 10

Table 10.1 Summary of case studies.

Table 10.2 Risk scores in a simple case.

Table 10.3 Distribution of risk scores in different seasons.

Table 10.4 Allowable stress for soft impact.

Table 10.5 Parameters used to describe a four‐sided glass panel.

Table 10.6 Data dimensions and stakeholders.

List of Illustrations

Chapter 01

Figure 1.1 Where does big data come from?.

Figure 1.2 Big data empowers business.

Figure 1.3 Roadmap to success.

Figure 1.4 Wish list for generating money out of data.

Figure 1.5 Monetising data.

Chapter 02

Figure 2.1 Deming’s ‘Plan, Do, Check, Act’ quality improvement cycle.

Figure 2.2 Six Sigma quality improvement cycle.

Figure 2.3 Example of data maturity model.

Figure 2.4 Data improvement cycle.

Chapter 03

Figure 3.1 Big data definition.

Figure 3.2 Internet of things timeline.

Figure 3.3 Example data structure.

Figure 3.4 NoSQL management systems.

Figure 3.5 Big data structure and latency.

Chapter 04

Figure 4.1 Supervised learning.

Figure 4.2 Unsupervised learning.

Figure 4.3 The CRISP‐DM process.

Figure 4.4 The SEMMA process.

Figure 4.5 General representation of the data mining process.

Figure 4.6 Time periods for data mining process.

Figure 4.7 Stratified sampling.

Figure 4.8 Lift chart for model comparison.

Figure 4.9 Lift chart at small scale.

Figure 4.10 An example of model control.

Chapter 05

Figure 5.1 Raw data from a customer transaction.

Figure 5.2 Bar chart of relative frequencies.

Figure 5.3 Example of cumulative view.

Figure 5.4 Example of a Pareto chart.

Figure 5.5 Example of a pie chart.

Figure 5.6 Scatterplot of company age and auditing behaviour with LOWESS line.

Figure 5.7 Scatterplot of design options.

Figure 5.8 Ternary diagram showing proportions.

Figure 5.9 Radar plot of fitness panel data.

Figure 5.10 Example of a word cloud.

Figure 5.11 Example of a mind map.

Figure 5.12 Location heat map.

Figure 5.13 Density map for minivans.

Figure 5.14 SPC chart of shipping journeys.

Figure 5.15 Decision tree analysis for older workers.

Figure 5.16 Gains chart.

Figure 5.17 Lift chart.

Figure 5.18 ROC curve development during predictive modelling.

Chapter 06

Figure 6.1 Example of logistic regression.

Figure 6.2 Corrected logistic regression.

Figure 6.3 Decision tree.

Figure 6.4 Artificial neural network.

Figure 6.5 Bayesian network analysis of survey data.

Figure 6.6 Bayesian network used to explore what‐if scenarios.

Figure 6.7 Plot of non‐linear separation on a hyperplane.

Figure 6.8 Dendrogram from hierarchical cluster analysis.

Figure 6.9 Parallel plot from K‐means cluster analysis.

Figure 6.10 Kohonen network with two‐dimensional arrangement of the output neurons.

Figure 6.11 SOM output.

Figure 6.12 T‐SNE output.

Figure 6.13 Correspondence analysis output: scatterplot of RPC2 vs RPC1, the two principal dimensions showing how the row profiles in a contingency table differ from each other.

Figure 6.14 Association rules.

Figure 6.15 Association analysis of products.

Figure 6.16 Comparison of customer base and population.

Figure 6.17 Relationship between energy usage and deprivation: scatterplot of mean AQ vs percentage of households deprived.

Figure 6.18 Map showing prices.

Chapter 07

Figure 7.1 Strategic opportunities.

Figure 7.2 How data can boost top‐ and bottom‐line results.

Figure 7.3 Typical data request.

Figure 7.4 Observed data and usage.

Figure 7.5 Maslow’s hierarchy of needs.

Figure 7.6 Data sources to empower consumer business.

Figure 7.7 Ready information on market opportunities.

Figure 7.8 Word cloud from keyword occurrences.

Figure 7.9 Using different data sources for analytics.

Figure 7.10 Daily sleep patterns.

Figure 7.11 Predictive analytics in insurance.

Chapter 08

Figure 8.1 Pathways to monetising data.

Figure 8.2 Segmentation features of walk‐in customers.

Figure 8.3 Business opportunities.

Chapter 09

Figure 9.1 Paths to monetisation.

Figure 9.2 Pareto diagram of customer compliments.

Figure 9.3 Graphical dashboard.

Figure 9.4 Decrypting the DNA of the best existing customers.

Figure 9.5 Aspects of digital maturity.

Figure 9.6 Closed loop of B2B customer profiling – continuous learning.

Figure 9.7 Automated B2B lead generation system.

Figure 9.8 New methods, new insights, smart business.

Figure 9.9 Misleading scatterplots.

Figure 9.10 Scatterplot with multiple features.

Figure 9.11 Histogram of suspicious‐quality recordings.

Chapter 10

Figure 10.1 The evolution of data analytics

Figure 10.2 Cumulative distribution of risk scores.

Figure 10.3 Data sources in the shipping industry.

Figure 10.4 Optimum speed recommendation.

Figure 10.5 Pruned decision tree.

Figure 10.6 Detail from decision tree

Figure 10.7 Customised communication.

Figure 10.8 Individualised communication.

Figure 10.9 Complexity of data mining steps.

Figure 10.10 Data in the customer journey.

Figure 10.11 Intelligent profiles and segments in B2C.

Figure 10.12 Personalised journey.

Figure 10.13 The reach of social media.

Figure 10.14 The power of social media.

Figure 10.15 Using peer group behaviour.

Figure 10.16 National statistics oil prices.

Figure 10.17 Example of reports portal

Figure 10.18 Making a business out of boring statistics.

Figure 10.19 Right place, right time.

Figure 10.20 Social media information summarised.

Figure 10.21 Visualisation of user engagement.

Figure 10.22 Concept of newsletter tracking.

Figure 10.23 Example report on testing different versions.

Figure 10.24 Customer profile details.

Figure 10.25 Company profile details.

Figure 10.26 Example of glass facades in buildings.

Figure 10.27 Half normal plot of a screening experiment.

Figure 10.28 Predicted vs calculated resistance factor with validation.

Figure 10.29 Residual plot of prices.

Figure 10.30 Visualisation of groups of products.

Figure 10.31 Open data available to enrich company data.

Figure 10.32 Diffusion map showing clusters of shares.

Figure 10.33 Sampling approach for benchmarking in China.

Figure 10.34 Three‐step approach to survey analytics.

Figure 10.35 Skateboard offer.

Figure 10.36 Customer journey.

Figure 10.37 Example of customer segments.

Figure 10.38 Virtual changing room.

Figure 10.39 Virtual supermarket at bus stop.

Figure 10.40 Input from miscellaneous IoT sensors.

Figure 10.41 Appealing sleep sensor display.

Figure 10.42 Sensors connected by mobile phone.

Figure 10.43 The connected car.

Figure 10.44 The new connected eco‐system.

Figure 10.45 Industry 4.0.

Figure 10.46 Industry 4.0 in action.

Guide

Cover

Table of Contents

Begin Reading

Pages

iii

iv

v

xi

xiii

xiv

xv

xvi

xvii

xix

xx

1

2

3

4

5

6

7

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

357

358

359

360

361

362

363

Monetising Data: How to Uplift Your Business

 

 

Andrea Ahlemeyer-Stubbe

Director Strategical Analytics at the servicepro Agentur für Dialogmarketing und Verkaufsförderung GmbH, Munich, Germany

 

Shirley Coleman

Technical Director, ISRU, School of Mathematics and Statistics, Newcastle University, UK

 

 

 

 

 

 

 

 

This edition first published 2018© 2018 John Wiley & Sons Ltd

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.

The right of Andrea Ahlemeyer‐Stubbe and Shirley Coleman to be identified as the authors of this work has been asserted in accordance with law.

Registered OfficesJohn Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USAJohn Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK

Editorial Office9600 Garsington Road, Oxford, OX4 2DQ, UK

For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.

Wiley also publishes its books in a variety of electronic formats and by print‐on‐demand. Some content that appears in standard print versions of this book may not be available in other formats.

Limit of Liability/Disclaimer of WarrantyWhile the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

Library of Congress Cataloging‐in‐Publication data applied for

ISBN: 9781119125136

Cover Design: WileyCover Images: (Business people) © JohnnyGreig/Gettyimages; (Currencies) © Inok/iStockphoto

 

 

 

This book is dedicated toAgnes, Albert, Christopher, Dirk, Rosie and RowanWith thanks

About the Authors

Andrea Ahlemeyer‐Stubbe is Director of Strategical Analytics at servicepro‐Agentur für Dialogmarketing und Verkaufsförderung GmbH, Munich, Germany (servicepro).

Upon receiving her Master’s degree in statistics from the University of Dortmund, Andrea formed a consulting firm, offering customised professional services to her clients. She now leads servicepro’s analytics team, working on international projects for well‐known brands in Europe, United States and China, drawing on the wealth of experience gained from her 20 years in the industry, specifically in the areas of data mining, data warehousing, database marketing, CRM, big data and social CRM. She is a frequent lecturer at several universities, as well as an invited speaker at professional conferences. She writes for special interest magazines as well as marketing and management publications. She was President of ENBIS (European Network for Business and Industrial Statistics) from 2007–2009.

Dr Shirley Coleman is Principal Statistician and Technical Director at the Industrial Statistics Research Unit, School of Mathematics and Statistics, Newcastle University and a visiting scholar at the Faculty of Economics, Ljubljana University, Slovenia. She works on data analytics in small and medium enterprises and the energy sector and contributed a highly ranked impact case study to Newcastle University’s Research Excellence Framework. She publishes in trade and academic journals and is co‐editor of several books. She is an elected member of the International Statistics Institute and a Chartered Statistician of the Royal Statistical Society. She is a well‐known international invited speaker and conference chair. She is an ambassador for communication and dissemination of statistics to the wider community. She was President of ENBIS (European Network for Business and Industrial Statistics) from 2004–2005.

The authors have previously collaborated on A Practical Guide to Data Mining for Business and Industry (Wiley, May 2014).

List of Figures

Figure 1.1

Where does big data come from?

Figure 1.2

Big data empowers business

Figure 1.3

Roadmap to success

Figure 1.4

Wish list for generating money out of data

Figure 1.5

Monetising data

Figure 2.1

Deming’s ‘Plan, Do, Check, Act’ quality improvement cycle

Figure 2.2

Six Sigma quality improvement cycle

Figure 2.3

Example of data maturity model

Figure 2.4

Data improvement cycle

Figure 3.1

Big data definition

Figure 3.2

Internet of things timeline

Figure 3.3

Example data structure

Figure 3.4

NoSQL management systems

Figure 3.5

Big data structure and latency

Figure 4.1

Supervised learning

Figure 4.2

Unsupervised learning

Figure 4.3

The CRISP‐DM process

Figure 4.4

The SEMMA process

Figure 4.5

General representation of the data mining process

Figure 4.6

Time periods for data mining process

Figure 4.7

Stratified sampling

Figure 4.8

Lift chart for model comparison

Figure 4.9

Lift chart at small scale

Figure 4.10

An example of model control

Figure 5.1

Raw data from a customer transaction

Figure 5.2

Bar chart of relative frequencies

Figure 5.3

Example of cumulative view

Figure 5.4

Example of a Pareto chart

Figure 5.5

Example of a pie chart

Figure 5.6

Scatterplot of company age and auditing behaviour with LOWESS line

Figure 5.7

Scatterplot of design options

Figure 5.8

Ternary diagram showing proportions

Figure 5.9

Radar plot of fitness panel data

Figure 5.10

Example of a word cloud

Figure 5.11

Example of a mind map

Figure 5.12

Location heat map

Figure 5.13

Density map for minivans

Figure 5.14

SPC chart of shipping journeys

Figure 5.15

Decision tree analysis for older workers

Figure 5.16

Gains chart

Figure 5.17

Lift chart

Figure 5.18

ROC curve development during predictive modelling

Figure 6.1

Example of logistic regression

Figure 6.2

Corrected logistic regression

Figure 6.3

Decision tree

Figure 6.4

Artificial neural network

Figure 6.5

Bayesian network analysis of survey data

Figure 6.6

Bayesian network used to explore what‐if scenarios

Figure 6.7

Plot of non‐linear separation on a hyperplane

Figure 6.8

Dendrogram from hierarchical cluster analysis

Figure 6.9

Parallel plot from K‐means cluster analysis

Figure 6.10

Kohonen network with two‐dimensional arrangement of the output neurons

Figure 6.11

SOM output

Figure 6.12

T‐SNE output

Figure 6.13

Correspondence analysis output

Figure 6.14

Association rules

Figure 6.15

Association analysis of products

Figure 6.16

Comparison of customer base and population

Figure 6.17

Relationship between energy usage and deprivation

Figure 6.18

Map showing prices

Figure 7.1

Strategic opportunities

Figure 7.2

How data can boost top‐ and bottom‐line results

Figure 7.3

Typical data request

Figure 7.4

Observed data and usage

Figure 7.5

Maslow’s hierarchy of needs

Figure 7.6

Data sources to empower consumer business

Figure 7.7

Ready information on market opportunities

Figure 7.8

Word cloud from keyword occurrences

Figure 7.9

Using different data sources for analytics

Figure 7.10

Daily sleep patterns

Figure 7.11

Predictive analytics in insurance

Figure 8.1

Pathways to monetising data

Figure 8.2

Segmentation features of walk‐in customers

Figure 8.3

Business opportunities

Figure 9.1

Paths to monetisation

Figure 9.2

Pareto diagram of customer compliments

Figure 9.3

Graphical dashboard

Figure 9.4

Decrypting the DNA of the best existing customers

Figure 9.5

Aspects of digital maturity

Figure 9.6

Closed loop of B2B customer profiling – continuous learning

Figure 9.7

Automated B2B lead generation system

Figure 9.8

New methods, new insights, smart business

Figure 9.9

Misleading scatterplots

Figure 9.10

Scatterplot with multiple features

Figure 9.11

Histogram of suspicious‐quality recordings

Figure 10.1

The evolution of data analytics

Figure 10.2

Cumulative distribution of risk scores

Figure 10.3

Data sources in the shipping industry

Figure 10.4

Optimum speed recommendation

Figure 10.5

Pruned decision tree

Figure 10.6

Detail from decision tree

Figure 10.7

Customised communication

Figure 10.8

Individualised communication

Figure 10.9

Complexity of data mining steps

Figure 10.10

Data in the customer journey

Figure 10.11

Intelligent profiles and segments in B2C

Figure 10.12

Personalised journey

Figure 10.13

The reach of social media

Figure 10.14

The power of social media

Figure 10.15

Using peer group behaviour

Figure 10.16

National statistics oil prices

Figure 10.17

Example of reports portal

Figure 10.18

Making a business out of boring statistics

Figure 10.19

Right place, right time

Figure 10.20

Social media information summarised

Figure 10.21

Visualisation of user engagement

Figure 10.22

Concept of newsletter tracking

Figure 10.23

Example report on testing different versions

Figure 10.24

Customer profile details

Figure 10.25

Company profile details

Figure 10.26

Example of glass facades in buildings

Figure 10.27

Half normal plot of a screening experiment

Figure 10.28

Predicted vs calculated resistance factor with validation

Figure 10.29

Residual plot of prices

Figure 10.30

Visualisation of groups of products

Figure 10.31

Open data available to enrich company data

Figure 10.32

Diffusion map showing clusters of shares

Figure 10.33

Sampling approach for benchmarking in China

Figure 10.34

Three‐step approach to survey analytics

Figure 10.35

Skateboard offer

Figure 10.36

Customer journey

Figure 10.37

Example of customer segments

Figure 10.38

Virtual changing room

Figure 10.39

Virtual supermarket at bus stop

Figure 10.40

Input from miscellaneous IoT sensors

Figure 10.41

Appealing sleep sensor display

Figure 10.42

Sensors connected by mobile phone

Figure 10.43

The connected car

Figure 10.44

The new connected eco‐system

Figure 10.45

Industry 4.0

Figure 10.46

Industry 4.0 in action

List of Tables

Table 2.1

Typical internal and external data in information systems

Table 2.2

Extract of sales data

Table 2.3

Company sales data analytics

Table 2.4

Internal sales data enriched with external data

Table 2.5

Scales of measurement examples

Table 2.6

Checklist for data readiness

Table 4.1

Confusion matrix for comparing models

Table 5.1

Partially tamed data

Table 5.2

Outcomes of a hypothesis test

Table 5.3

Typical significance borders

Table 5.4

Examples of statistical tests

Table 5.5

Example of a contingency table

Table 5.6

Target proportions

Table 5.7

Confusion matrix

Table 5.8

Gains chart

Table 5.9

Non‐cumulative lift and gains table

Table 6.1

Example of a contingency table

Table 6.2

Analysis table for goodness of fit

Table 8.1

Business models for types of exchange

Table 8.2

Business models for B2C selling

Table 8.3

Business models for service providers

Table 9.1

Business model canvas of the comparisons between data brokers and insight innovators

Table 10.1

Summary of case studies

Table 10.2

Risk scores in a simple case

Table 10.3

Distribution of risk scores in different seasons

Table 10.4

Allowable stress for soft impact

Table 10.5

Parameters used to describe a four‐sided glass panel

Table 10.6

Data dimensions and stakeholders

Preface

When we finished writing our Practical Guide to Data Mining for Business and Industry, we realised that there were still things to say. The growth of interest in data has been enormous and there are now even more opportunities than during the earlier years when there was a steady awakening to the importance of data for business and industry.

Data analytics appears on billboards in mainstream locations such as airports, and even mathematics is being coupled with adverts for cars in a positive way. Everyone is aware that they have data and has seen the graphs and predictions that analysis produces.

The book describes how any business can be uplifted by monetising data. We show how data is generated by sensors, smart homes, apps, website visits, social network usage, digital communication, purchase behaviour, credit card usage, connected car devices and self‐quantification. Enriched by integrating with official statistics, analysis of these datasets brings real business advantage.

The book invites the reader to think about their data resources and be creative in how they use them. The book is not organised as a technical text but includes many examples of innovative applications of statistical thinking and analytical approaches. It does not propose original statistical or machine learning methods but focuses on applications of data‐driven approaches. It is general in scope and can thus serve as an introductory text. It has a management focus and the reader can judge for themselves where they can use the ideas. The structure of the book aims to be logical and cover the whole loop of using data for business decisions. The idea of exploring and giving advice on how to convert data into money is really appealing.

Even after several years of excitement about big data, there are few practical case studies available. For this reason, we include 21 in the final chapter to give realistic suggestions for what to do. The other chapters of the book give necessary background and motivational content.

It is timely to publish this book now, as big data and data analytics have captured the imagination of business and public alike. Data can be seen as the most powerful resource of the future; we believe it has more influence on the wealth of companies and people than any other resource. The authors have long been proponents of data analysis for business advantage and so it is with delight that we can collate our experience and rationale and share it with other people.

The ideas in this book have arisen from many hours of fascinating consulting work. We have felt honoured to be allowed to immerse ourselves in the company culture and explore their data, and been able to present solutions that in many cases have brought great financial benefits.

We are grateful to all the business people we have worked with. Writing takes considerable time and our families and friends have been very accommodating. We thank them all very much.

1The Opportunity

1.1 Introduction

Data awareness has swept across economic, political, occupational, social and personal life. Making sense of the fabulous opportunities afforded by such an abundance of data is the challenge of every business and each individual. The journey starts with understanding what data is, where it comes from, what insight it can give and how to extract it. These activities are sometimes referred to as descriptive analytics and predictive analytics. In descriptive analytics data is explored by looking at summary statistics and graphics, and the results are highly accessible and informative. Predictive analytics takes the analysis further and involves statistical approaches that utilise the full richness of the data and lead to predictive models to aid decision making.

This introductory chapter discusses the rise in data, changes in attitude to data and the advantages of getting to grips with accessing, analysing and utilising data. Definitions of concepts such as open data and big data are followed by guidance for reading the rest of the book.

1.2 The Rise of Data

There is much more data available and accessible than ever before.

Increasingly data is discussed in the popular press and, rather than shying away from figures, statistics and mathematics, advertisers are using these words more and more often. People are becoming more comfortable with data. This is clear from the increase in the use of self‐measurement and mapping facilities on personal devices such as mobile phones and tablets; people have a thirst for measuring everything in their daily life and like to try and control things to keep their life in good shape. Many people choose vehicles that are fitted with advanced digital measurement devices that manage engine performance and record fuel usage and location. All this is in addition to the increased automation of production lines and machinery, which have resulted in copious measurements being a familiar concept. A major contributor to the rise in importance of data is the impact of cheap data storage. For example, an external hard drive with terabytes of memory can be bought for the price of a visit to the hairdresser.

The common phrase to describe this changed world is ‘big data’ (Figure 1.1). A book on monetising data is inevitably about big data. We will interpret the term big data as data that is of a volume, variability and velocity that means common methods of appraisal are not appropriate. We need analytical methods to see the valuable patterns in it.

Figure 1.1 Where does big data come from?.

Since the early 2000s there has been a drive to make data more available, giving rise to the open data movement. This promotes sharing of data gathered with the benefit of public funding and includes most official statistics, academic research output and some market, product and service evaluation data. The opening up of data has led to a steep increase in requests for access to even more data; the result is a burgeoning interest in action learning and enthusiasm to understand the potential waiting to be uncovered from the data. The profession of data scientist has evolved and now encapsulates the skills and knowledge to handle and generate insights from this information.

Figure 1.2 shows how big data combined with analytics might empower different areas of any business. The aim of this book is to encourage people to use their big data to work out exciting business opportunities, make major changes and optimise the way things are run.

Figure 1.2 Big data empowers business.

1.3 Realising Data as an Opportunity

One of the key motivations for this book on monetising data is the sheer amount of under‐utilised data around. Hardly less important is the under‐achievement in terms of business benefit derived among those who do use their data. This suggests a two‐dimensional representation of the state of organisations, with one axis representing the usage of business data and the other axis representing the business benefit derived from it. Needless to say, the star performers are at the top right‐hand side of the resulting diagram in Figure 1.3. Being in the top and right‐hand corner is better than being at the top or at the right‐hand side of the axes because the two factors reinforce each other in a synergistic manner, giving greater benefits than either alone.

Figure 1.3 Roadmap to success.

The marketplace is highly heterogeneous, with companies and institutions (all referred to as ‘organisations’ henceforth) differentiated in many ways, including:

sector

size of turnover

size in numbers of employees

maturity

research focus

product or service development.

The baseline against which organisations can benchmark themselves in Figure 1.3 is different for different types of organisation.

Familiar players using big data include retail, finance, automotive manufacturers, health providers and process industries. In addition, the following are some of the less familiar organisations likely to be in possession of big data:

Sports societies: these may have larger turnover than expected and hold vast data banks of members’ details and their sporting activities.

Museums and galleries: these may have loyalty cards and multiple entry passes that yield customer details, frequency of visits, distance travelled, inclination and time spent at the venue.

Theatres and entertainment venues: these have names, addresses and frequency of attendance of attendees, and can study their catchment area and the popularity of different acts.

Libraries: these have names and addresses and members’ interests and usage.

Small retailers: these have records of itemised sales by day of week, time of day and season plus amount spent.

Craft and niche experts: who are first aware of trends and may have a global outlook.

All these organisations can take advantage of their data but they start from different points with different resources and capabilities; with good ideas they may have the opportunity to become winners in their own areas. Experience suggests that organisations have a secret wish list for generating money out of their data. Figure 1.4 shows the ranking we observed from our clients. However, this is just a snapshot and does not include business enrichment and transformation, which are also possible.

Figure 1.4 Wish list for generating money out of data.

Figure 1.5 shows a very generalised process for monetising data. Data comes into the process and is first used for business monitoring, leading to business insights; these might generate business optimisation and might lead to monetisation and potential business transformation.

Figure 1.5 Monetising data.

Despite differences in scale, the matrix in Figure 1.3 can help any organisation to map their current situation and plan their next steps to uplift their business.

1.4 Our Definition of Monetising Data

Data is the fundamental commodity, consisting of a representation of facts. However, when the data are summarised and illustrated they can lead to meaningful information, and assessing the meaningful information in context can lead to knowledge and wisdom.

Monetising data is more than just selling data and information. It includes everything where data is used in exchange for business advantage and supports business success. Large companies are often data rich and some have realised the advantage this gives them. Others consider themselves data rich but information poor because they have lots of data but it is not in a form that they can easily interpret or use to gain business insights. Statistical enthusiasm is a rare commodity but those businesses that pay attention to their data can find the answers to many of their policy and productivity questions. For example, scrutiny of data on sales easily yields information about seasonal trends: sales per customer might show shortfalls in maximising selling opportunities; total income might show overall success in attracting buyers, and so on.

Case studies and real data from our consulting practices are used throughout the book to illustrate the ideas, methods and techniques that are involved. As will be seen, most data can be monetised to bring benefit to the organisation. However, a lot of effort has to be expended to get the data into a suitable format for analysis. Data readiness can be assessed using tools that we will discuss. As analytics progresses, guidelines for data improvement become meaningful and we introduce the concept of the data improvement cycle to help organisations in continuous improvement and moving forward with their data analytics.

This book is aimed at managers in progressive organisations: managers who are keen to develop their own careers and who have the opportunity to suggest new ideas and innovative approaches for their organisation and influence how they are taken forward. The material requires background knowledge of dealing with numbers and spreadsheets and basic business principles. More specialised techniques, such as the use of decision tree analysis and predictive models, are fully explained. The main issue is the strength of desire to join the data revolution and hopefully after reading this book you will be an excited convert.

1.5 Guidance on the Rest of the Book

The rest of the book is planned as follows. Chapters 2and 3 address data collection and preparation issues, including the use of mapping and meteorological data as well as official statistics. Chapter 4 looks at general issues around data mining: as a concept and a mechanism for gathering insights from data. Chapters 5and 6 address technical methods; Chapter 5 looks at descriptive analytics, starting with statistical methods for summarising data and graphical presentations, and Chapter 6 moves on to statistical testing, modelling, segmentation, network analysis and predictive analytics.

Chapters 7and 8 introduce the different strategies, motivations, modes and concepts for monetising data and examine barriers and enablers for organisations seeking to realise the full potential of their data, their valuable asset. Monetisation can be viewed strategically and operationally. Strategically we can look at new business directions, step changes in thinking, disruptive innovation and new income streams. Operationally we can consider optimising current business models, and making better use of customer targeting and segmentation. In Chapter 7 we focus on strategic issues, whilst operational improvements of the existing business will be explored in Chapter 8. In Chapter 9 we will consider the practicalities of implementation, such as issues of ethics, privacy and security; loss of cultural and technical learning due to staff turnover and the other dampers that have to be overcome before we can achieve strategic steps forward and improvement of the current situation.

The mutual importance of theory and practice has long been recognised. As Chebyshev, a founding father of much statistical theory, said back in the 19th century, ‘Progress in any discipline is most successful when theory and practice develop hand in hand’. Not only does practice benefit from theory but theory benefits from practice. So in Chapter 10 we describe a set of case studies in which monetisation has brought big gains and uplifted the business. Thus we will aim to end the book on a high note and provide inspiration to move forward.

If you locate yourself within the grid in Figure 1.3 you can see which parts of the book are most relevant for you. Those readers at the bottom left are probably at the beginning of their exploration of monetisation and could well jump to the case studies in Chapter 10 for motivation and then return to Chapter 2. Those at the bottom right have already gained substantial business advantages but could benefit from learning new statistical and data‐mining techniques to make deeper use of their data, as described in the more technical Chapters 3–6. Those at the top left already have experience of analysing data but need to realise a better business advantage and could go straight to Chapters 7–9. Those at the top right can read the whole book for revision purposes and further insights!

Note that we avoid naming specific companies. Instead we refer to them in a generic way and the reader is welcome to find example companies by searching online.

2About Data and Data Science

2.1 Introduction

There is a pleasing increase in awareness of the importance of data. This extends across industry sectors and organisations of all sizes. Raising the profile of data means that there is more openness to exploring it and more determination to put it to good use. This chapter deals with aspects of data that are relevant to the practitioner wishing to apply data analytics to monetise data. We review the types of data that are available and how they are accessed. We consider the fast‐growing big data from internet exchanges and the attendant quality and storage issues, and consider which employees are best placed to maximise the value added from the data. We also consider the slower build‐up of transactional data from small traders and experiments on consumer behaviour. These can yield discrete collections of valuable figures ready to turn into information.

Internal company data arises as part of day‐to‐day business, and includes transactions, logistics, administration and financial data. This can be enriched by a variety of external data sources such as official statistics and open‐data sources. There is also a mass of useful data arising from social media. We define scales of measurement and terms commonly used to distinguish different types of data, the meaning and necessity of data quality, amounts of data and its storage, the skills needed for different data functions, and data readiness and how to assess where a company is on the cycle of data improvement.

2.2 Internal and External Sources of Data

Data to be used for enterprise information and knowledge creation can come from inside the company or from external sources. Integrating data from different sources is a powerful tactic in data mining for monetisation and gives the most scope for insights and innovation.

Naturally, the features of these different types of data vary and the costs associated with them range from very little to a lot. Internal data arise as part of the business and in principle they should be readily available for further analysis. In practice, the data are often difficult to access, belong to different systems and are owned by different stakeholders. A summary is given in Table 2.1.

Table 2.1 Typical internal and external data in information systems.

Data source

Example

Characteristics

Internal – owned by company

Date a product was manufactured or invoice data

In control of company, may be reliable; if not, the data collection process can be improved

External – owned by a third party

Social network data, credit rating or data about the area the customer lives in

May not be a perfect match in time scale or location

Data collected by someone but no clear ownership

Unattributed data and information, web scraping, aggregated information

Available but perhaps not easily usable, making it usable may cost money as it may involve a service provider

External – open source

National statistics institutions and Eurostat data

Available but usually aggregated with fixed granularity, timescale and coverage

The issue of ownership is important because we may wish to use data and tables that are published but we don’t know to whom they belong, how accurate they are or how carefully they were obtained. The data may be available and easy to collect but we don’t know if there are any intellectual property rights that we may be inadvertently violating.

Data collected by ‘web scraping’ is an interesting case; the data here might be people’s online comments, obtained, for example, by text mining websites. The comments may be anonymous or attributed to a nickname, so that ownership is not clear. If the comments are attributed to someone then they are owned by a third party, but otherwise thought is required before using them.

Internal, operational information systems move large amounts of internally produced data through various processes and subsystems, such as payment control, warehouse, planning/forecasting, web servers, adserver technology systems and newsletter systems. One drawback with internal data is that it is used primarily to handle the daily business and operational systems may lack a facility for keeping a comprehensive history. However, at least the quality and reliability of internal data is in the control of the company. This is not the case for external data unless it has been generated under very strict guidelines, such as those of a research institute or government statistical service.

External data is generated outside the company’s own processes; it is often needed as a set of reference values. For example, a service provider can compare the characteristics of their customer base with those of the target population. Characteristics such as employment, housing and age distribution are available from national statistics institutions (NSIs). Official statistics are necessarily aggregated to conserve confidentiality. The level of granulation has to be such that people cannot identify individuals by triangulating knowledge from several sources.

Eurostat collects data from all European NSIs and has a very comprehensive website at www.eu.eurostat.org. Considerable effort has been invested by government statistical services to make their websites user‐friendly, not least because they are under pressure to show that they provide a useful service and are worth the public expense that they represent. Aggregated data are available as tables and graphics that can be animated, and there is a vast amount of detail available. However, it can take some patience to navigate to the data required and it is a good idea to make advance preparations against the possibility of needing the data in a hurry. An example of the use of NSI data is included in the case study in Section 10.6.

As well as providing reference information, external data is often also valuable for providing additional information about a customer. Analytically focused information systems such as marketing databases and customer relationship management (CRM) systems frequently add external data. This may be in the form of specifically purchased information about the customer, such as their address, peer group or segment, or their credit rating.

As an example, consider a company that has data about books bought in a certain geographical area over a period of time. The data is in time order for each sale and so is long and thin; an extract is shown in Table 2.2. Each row represents a sale and additional information is in each column. Sometimes the rows are referred to as ‘cases’.

Table 2.2 Extract of sales data.

Sale ID

Date

Category

Quantity

Value

Customer ID

1

14/01/2016

2

2

45

12221

2

14/01/2016

3

1

55

12221

3

15/01/2016

3

3

44

14334

4

15/01/2016

2

2

33

21134

5

15/01/2016

2

2

66

22443

6

18/01/2016

3

1

75

11232

7

19/01/2016

2

2

33

22234

8

20/01/2016

3

3

78

23231

9

20/01/2016

3

4

56

24422

The data is valuable even without further additions, but descriptive analytics may yield a wide range of important information as shown in Table 2.3.

Table 2.3 Company sales data analytics.

Company data

Tables

Graphics

Statistics

Quantity and value of sales in different categories with time stamp and customer identification (ID)

Quantity and value of sales in each category

Time trends of sales values; bar charts of quantity and value in different categories

Mean quantity and value of sales per category and customer

This data can be enriched by adding company‐owned information about the customer, including their address, date of first purchase, date of last purchase, and the frequency and monetary value of their purchases. These last factors feature in segmentation methods based on RFM: the recency, frequency and monetary value of purchases. Descriptive analytics of the data can now be enhanced to include statistics such as sales per customer segment.

The data can be further augmented by adding freely available open data collected by an NSI or by providing knowledge about the customer based on their location, such as the type of housing in the area, the population age range, socio‐economic activity, and so on. Other more specific data may be obtained about their peer group or segment from commercial sources such as www.caci.co.uk.

Descriptive analytics of the data can now be enhanced to include statistics such as sales per socio‐economic group. This could have implications for the effectiveness of promotional activities, or allow assessment of the impact of opening an outlet in an area or of increasing salesperson presence in an area (Table 2.4). Predictive analytics can address issues such as which factors are most related to sales quantities and values.

Table 2.4 Internal sales data enriched with external data.

Company data

Enrichment data

Descriptive analytics

Predictive analytics

Customer RFM and location

Area details of location of customer

Sales per area, housing type

Clusters of similar locations

In the example, the company now has more information about book sales and can use this in their promotions.

Combining data from different areas and plotting them as they change over time is the background to the ground breaking Gapminder website, www.gapminder.org, developed by Hans Rosling. For example, scatterplots of income per person against life expectancy at birth for each country plotted over time from 1809 to 2009 show the amazing changes that have taken place in different countries. Animated graphics are a powerful way to show the relative changes. Work by Stotesbury and Dorling has explored the relationships between country wealth and their waste production, water consumption, education levels and so on.

In a well‐organised, data‐aware company, the quality of internal data may be better than that from external resources, not least because the company can control exactly how and when the internal data is generated. External data may not match the internal data exactly in time (being contemporaneous) or location, but nevertheless the availability (often free of charge) and the extent of this data means that even poorly matched external data can be useful.

2.3 Scales of Measurement and Types of Data

Knowing about the different scales of measurement and types of data is important as it helps to determine how the data should be analysed. Measurements such as value of sales are quite different from counts of how many customers entered a retail outlet, or of the proportion of times sales exceeded a certain limit. Descriptive data, such as a location being ‘Rural’, ‘Coastal’, ‘Urban’, or ‘Suburban’, need to be treated quite differently from measurement data. ‘Frequency of occurrence’ can be evaluated for descriptive data but it does not make sense to calculate an average value (say, for location) unless some ordering is applied, for example a gradation between agricultural and industrial locations, so that an average has some sort of meaning.

Business information comes in many forms. Reports and opinions are qualitative in nature whereas sales figures and numbers of customers are clearly quantitative. Qualitative data can usefully be quantified into non‐numerical and numerical data. For example, theme analysis applied to reports gives a non‐numerical summary of the themes in their content and the frequency of occurrence of the themes gives a meaningful numerical summary.

There are different types of quantitative data, and they may be described in a number of ways. Table 2.5 contrasts some of the more common terms.

Table 2.5 Scales of measurement examples.

Scales of measurement

Examples

Continuous vs categorical

Income (30,528 per year) vs size of family (medium = 3–5 family members)

Categorical: ordinal vs nominal

Opinion levels in market research (+2 = strongly agree, 1 = agree, 0 = no opinion, −1 = disagree, −2 = strongly disagree) vs industry sector (steel, craft, agriculture)

Numerical vs non‐numerical

Age (35.4 years old) vs colour (blue)

Data can be classified as continuous or categorical. Categories can be nominal or ordinal. The simplest level of measurement is nominal data, which indicates which named category is applicable. For example, a customer may live in an urban area, a rural area or a mixed area. In a dataset, this nominal data may be given in a column of values selected from urban/rural/mixed, with one row for each customer.

Once data has been identified as a useful analytical entity, it is often referred to as a ‘variable’. A data item such as income has a different value for each person and is called a variable because it varies across the sample of people being investigated. Note that being referred to as a ‘variable’ does not imply that the income of a particular person is uncertain, just that income varies across different people.

If a categorical variable has only two levels, for example ‘Male’ or ‘Female’, then the data is referred to as ‘binary’. Note that sex and gender refer to different concepts, with sex being biological and gender referring to the way the person sees themselves. Datasets can have several categories for gender. For example, one of the public datasets made available for data mining for the Knowledge, Discovery and Datamining Cup lists people who have lapsed from making donations to US veterans (see http://www.kdnuggets.com/meetings/kdd98/kdd‐cup‐98.html). The pivot table for gender has entries for ‘Male’, ‘Female’, ‘Missing’ and ‘Not known’ because the donation was from a joint account. In addition, some entries are blank and there is one case with the letter C, which does not have a defined meaning. There are six categories, some of which are only sparsely filled. If gender is used as a variable in analysis this sparseness may cause problems and the data should be pre‐processed before analysis. Note that there may also be additional accidental categories for ‘M’, ‘m’, ‘man’, and other erroneous entries.

If there is any order associated with the categories, then they are referred to as ‘ordinal’ data. Opinions can be captured as ordinal variables using questions, such as:

How was your experience today? Dreadful, poor, OK, good or very good

The responses usually need to be quantified if any meaningful analysis is to be carried out. In this example, it makes sense to code ‘Dreadful’ as −2, ‘Poor’ as −1, ‘OK’ as zero, ‘Good’ as +1 and ‘Very good’ as +2. The words can be replaced by pictures or emoticons as a more effective way of extracting opinion. Researchers have also investigated physical ways of gathering opinions; the engagement of a person can be evaluated by the length of time they keep eye contact and their certainty can be evaluated by the time they take to answer the question.

Variables that represent size are referred to as measures, measurements, scales or metrics. In data mining, the term ‘metric’ includes continuous measurements such as time spent, and counts such as the number of page views. Some statistical software packages, such as WEKA and SPSS, distinguish between scale and string variables, and will only allow certain actions with certain types of data. A string variable, such as ‘Male’ or ‘Female’ often needs to be recoded as a binary scale variable, taking values such as 1 or 2, as an additional alternative form, to ensure flexibility in the subsequent analysis. MINITAB distinguishes between quantitative variables and text variables and will not perform actions unless the appropriate data type is presented. Excel distinguishes between numbers and text. In R software, variables have to be specified as either numeric (numbers with decimal places), integers (whole numbers positive or negative), characters (string variables) or logical (true or false).

Many data items are measured on a continuous scale, for example the distance travelled to make a purchase. Continuous data does not need to be whole numbers like 4 km, but can be fractions of whole numbers, say 5.68 km. Continuous data may be of the interval type or the ratio type. Interval data has equal intervals between units but an arbitrary zero point. For example shoe or hat sizes. Ratio data is interval‐type data with the additional feature that zero is meaningful, for example a person’s salary. The fixed zero means that ratios are constant: €20,000 is twice as much as €10,000, and €6 is twice as much as €3.

Dates and times are interval data that have special treatment in statistical software because of their specific role in giving the timeline in any analysis. Usually a variety of formats are allowed. A numerical value can be extracted from the date as the number of days since a specified start date. The day of the week and the day of the month can be identified and both are useful depending on the analysis being carried out.

The different numbers of days in a month can sometimes cause problems (see Box).

Box Example of problems with days of the month.

Wet weather ‘behind drop in mortgages’

Metro newspaper, Tuesday 1 April 2014

The article states that:

The number of mortgages granted to home‐buyers fell to a four‐month low in February, Bank of England figures show. The drop to 70,309 from 76,753 in January was likely because of wet weather, analysts said. Ed Stansfield of Capital Economics said the temporary fall ‘should go some way towards calming fears the housing market recovery is rapidly spiralling out of control’.

76,753 mortgages in January equates to 2476 per day. At the same rate, February, with 28 days, should have 69,325 mortgages. The ‘drop’ is therefore actually an increase of 984.

Any comments?

The time variable can be represented by the number of minutes, hours, and so on since a start time. Time calculations can cause problems in practice, as some days start at 00:00, while others start at 06:00 or 07:00, say in Central European Time. These small discrepancies can have big implications in data analysis. For example, analysing the pattern of temperatures recorded across a geographical area quickly illustrated that some records were of mean temperature for the 24 hours from 00:00 and some were from 06:00.

Nominal and ordinal variables, referred to as categorical or classification variables, often represent dimensions, factors or custom variables that allow you to break down a metric by a particular value, for example screen views by screen name.