Big Data for Insurance Companies -  - E-Book

Big Data for Insurance Companies E-Book

0,0
139,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

This book will be a "must" for people who want good knowledge of big data concepts and their applications in the real world, particularly in the field of insurance. It will be useful to people working in finance and to masters students using big data tools. The authors present the bases of big data: data analysis methods, learning processes, application to insurance and position within the insurance market. Individual chapters a will be written by well-known authors in this field.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 273

Veröffentlichungsjahr: 2018

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

Title

Copyright

Foreword

Introduction

1 Introduction to Big Data and Its Applications in Insurance

1.1. The explosion of data: a typical day in the 2010s

1.2. How is big data defined?

1.3. Characterizing big data with the five Vs

1.4. Architecture

1.5. Challenges and opportunities for the world of insurance

1.6. Conclusion

1.7. Bibliography

2 From Conventional Data Analysis Methods to Big Data Analytics

2.1. From data analysis to data mining: exploring and predicting

2.2. Obsolete approaches

2.3. Understanding or predicting?

2.4. Validation of predictive models

2.5. Combination of models

2.6. The high dimension case

2.7. The end of science?

2.8. Bibliography

3 Statistical Learning Methods

3.1. Introduction

3.2. Decision trees

3.3. Neural networks

3.4. Support vector machines (SVM)

3.5. Model aggregation methods

3.6. Kohonen unsupervised classification algorithm

3.7. Bibliography

4 Current Vision and Market Prospective

4.1. The insurance market: structured, regulated and long-term perspective

4.2. Big data context: new uses, new behaviors and new economic models

4.3. Opportunities: new methods, new offers, new insurable risks, new management tools

4.4. Risks weakening of the business: competition from new actors, “uberization”, contraction of market volume

4.5. Ethical and trust issues

4.6. Mobilization of insurers in view of big data

4.7. Strategy avenues for the future

4.8. Bibliography

5 Using Big Data in Insurance

5.1. Insurance, an industry particularly suited to the development of big data

5.2. Examples of application in different insurance activities

5.3. New professions and evolution of induced organizations for insurance companies

5.4. Development constraints

5.5. Bibliography

List of Authors

Index

End User License Agreement

List of Tables

1 Introduction to Big Data and Its Applications in Insurance

Table 1.1. Annual Google statistics [STA 16b]

Table 1.2. Usage statistics for big data tools according to a survey of 2,895 respondents from the data analytics community and vendors. The respondents were from US/Canada (40%), Europe (39%), Asia (9.4%), Latin America (5.8%), Africa/Middle East (2.9%) and Australia/NZ (2.2%). They were asked about 102 different tools, including the “Hadoop/big data tools” shown here [PIA 16].

5 Using Big Data in Insurance

Table 5.1. Solvency II requirements regarding data quality used for the purpose of calculating technical provisions

Table 5.2. Existing pay “X” you drive guarantees

Table 5.3. Main data professions (the list is not necessarily exhaustive but the name may vary according to organizations)

Table 5.4. Practical sheets falling within the CNIL insurance package

Table 5.5. New definitions distinguishing the data sensitivity level

Table 5.6. Data security requirements

Table 5.7. Skills expected of actuaries and data scientists

List of Illustrations

1 Introduction to Big Data and Its Applications in Insurance

Figure 1.1. Evolution of the interest in the term big data for Google searches (source: Google Trends, 27th September 2016)

Figure 1.2. The three Vs of big data

Figure 1.3. Development of data volumes and their units of measure

Figure 1.4. Hadoop and its ecosystem (non-exhaustive)

2 From Conventional Data Analysis Methods to Big Data Analytics

Figure 2.1. From underfitting to overfitting

Figure 2.2. A linear and nonlinear classifier (according to [HAS 09]). For a color version of the figure, see www.iste.co.uk/corlosquet-habart/insurance.zip

Figure 2.3. In a plane, some configurations of four points are not linearly separable

Figure 2.4. Optimal VC dimension. For a color version of the figure, see www.iste.co.uk/corlosquet-habart/insurance.zip

5 Using Big Data in Insurance

Figure 5.1. Classical data science process

Figure 5.2. Type of matched data for the study of behaviors in life insurance

Figure 5.3. Stakes in the fight against fraud

Guide

Cover

Table of Contents

Begin Reading

Pages

C1

iii

iv

v

xi

xii

xiii

xiv

xv

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

163

165

166

G1

G2

G3

G4

G5

G6

G7

e1

Big Data, Artificial Intelligence and Data Analysis Set

coordinated byJacques Janssen

Volume 1

Big Data for Insurance Companies

Edited by

Marine Corlosquet-Habart

Jacques Janssen

First published 2018 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.

Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address:

ISTE Ltd27-37 St George’s RoadLondon SW19 4EUUK

www.iste.co.uk

John Wiley & Sons, Inc.111 River StreetHoboken, NJ 07030USA

www.wiley.com

© ISTE Ltd 2018The rights of Marine Corlosquet-Habart and Jacques Janssen to be identified as the authors of this work have been asserted by them in accordance with the Copyright, Designs and Patents Act 1988.

Library of Congress Control Number: 2017959466

British Library Cataloguing-in-Publication DataA CIP record for this book is available from the British LibraryISBN 978-1-78630-073-7

Foreword

Big data is not just a slogan, but a reality as shown by this book. Many companies and organizations in the fields of banking, insurance and marketing accumulate data but have not yet reaped the full benefits. Until then, statisticians could make these data more meaningful: through correlations and the search for major components. These methods provided interesting, sometimes important, but aggregated information.

The major innovation is that the power of computers now enables us to do two things that are completely different from what was done before:

– accumulate individual data on thousands or even millions of clients of a bank or insurance company, and even those who are not yet clients, and process them separately;

– deploy the massive use of unsupervised learning algorithms.

These algorithms, which, in principle, have been known for about 40 years, require computing power that was not available at that time and have since improved significantly. They are unsupervised, which means that from a broad set of behavioral data, they predict with amazing accuracy the subsequent decisions of an individual without knowing the determinants of his/her action.

In the first three chapters of this book, key experts in applied statistics and big data explain where the data come from and how they are used. The second and third chapters, in particular, provide details on the functioning of learning algorithms which are the basis of the spectacular results when using massive data. The fourth and fifth chapters are devoted to applications in the insurance sector. They are absolutely fascinating because they are written by highly skilled professionals who show that tomorrow's world is already here.

It is unnecessary to emphasize the economic impact of this study; the results obtained in detecting fraudsters are a tremendous reward to investments in massive data.

To the best of my knowledge, this is the first book that illustrates so well, in a professional context, the impact and real stakes of what some call the “big data revolution”. Thus, I believe that this book will be a great success in companies.

Jean-Charles POMEROLChairman of the Scientific Board of ISTE Editions

Introduction

This book presents an overview of big data methods applied to insurance problems. Specifically, it is a multi-author book that gives a fairly complete view of five important aspects, each of which is presented by authors well known in the fields covered, who have complementary profiles and expertise (data scientists, actuaries, statisticians, engineers). These range from classical data analysis methods (including learning methods like machine learning) to the impact of big data on the present and future insurance market.

Big data, megadata or massive data apply to datasets that are so vast that not only the popular data management methods but also the classical methods of statistics (for example, inference) lose their meaning or cannot apply.

The exponential development of the power of computers linked to the crossroads of this data analysis with artificial intelligence helps us to initiate new analysis methods for gigantic databases that are mostly found in the insurance sector as presented in this book.

The first chapter, written by Romain Billot, Cécile Bothorel and Philippe Lenca (IMT Atlantique, Brest), presents a sound introduction to big data and its application to insurance. This chapter focuses on the impact of megadata, showing that hundreds of millions of people generate billions of bytes of data each day. The classical characterization of big data by 5Vs is well illustrated and enriched by other Vs such as variability and validity.

In order to remedy the insufficiency of classical data management techniques, the authors develop parallelization methods for data as well as possible tasks thanks to the development of computing via the parallelism of several computers.

The main IT tools, including Hadoop, are presented as well as their relationship with platforms specialized in decision-making solutions and the problem of migrating to a given oriented strategy. Application to insurance is tackled using three examples.

The second chapter, written by Gilbert Saporta (CNAM, Paris), reviews the transition from classical data analysis methods to big data, which shows how big data is indebted to data analysis and artificial intelligence, notably through the use of supervised or non-supervised learning methods. Moreover, the author emphasizes the methods for validating predictive models since it has been established that the ultimate goal for using big data is not only geared towards constituting gigantic and structured databases, but also and especially as a description and prediction tool from a set of given parameters.

The third chapter, written by Franck Vermet (EURIA, Brest), aims at presenting the most commonly used actuarial statistical learning methods applicable to many areas of life and non-life insurance. It also presents the distinction between supervised and non-supervised learning and the rigorous and clear use of neural networks for each of the methods, particularly the ones that are mostly used (decision trees, backpropagation of perceptron gradient, support vector machines, boosting, stacking, etc.).

The last two chapters are written by insurance professionals. In Chapter 4, Florence Picard (Institute of Actuaries, Paris) describes the present and future insurance market based on the development of big data. It illustrates its implementation in the insurance sector by particularly detailing the impact of big data on management methods, marketing and new insurable risks as well as data security. It pertinently highlights the emergence of new managerial techniques that reinforce the importance of continuous training.

Emmanuel Berthelé (Optimind Winter, Paris) is the author of the fifth and last chapter, who is also an actuary. He presents the main uses of big data in insurance, particularly pricing and product offerings, automobile and telematics insurance, index-based insurance, combating fraud and reinsurance. He also lays emphasis on the regulatory constraints specific to the sector (Solvency II, ORSA, etc.) and the current restriction on the use of certain algorithms due to an audibility requirement, which will undoubtedly be uplifted in the future.

Finally, a fundamental observation emerges from these last two chapters cautioning insurers against preserving the mutualization principle which is the founding principle of insurance because as Emmanuel Berthelé puts it:

“Even if the volume of data available and the capacities induced in the refinement of prices increase considerably, the personalization of price is neither fully feasible nor desirable for insurers, insured persons and society at large.”

In conclusion, this book shows that big data is essential for the development of insurance as long as the necessary safeguards are put in place. Thus, this book is clearly addressed to insurance and bank managers as well as master’s students in actuarial science, computer science, finance and statistics, and, of course, new master’s students in big data who are currently increasing.

Introduction written by Marine CORLOSQUET-HABART and Jacques JANSSEN.

1Introduction to Big Data and Its Applications in Insurance

1.1. The explosion of data: a typical day in the 2010s

At 7 am on a Monday like any other, a young employee of a large French company wakes up to start her week at work. As for many of us, technology has appeared everywhere in her daily life. As soon as she wakes up, her connected watch, which also works as a sports coach when she goes jogging or cycling, gives her a synopsis of her sleep quality and a score and assessment of the last few months. Data on her heartbeat measured by her watch are transmitted by WiFi to an app installed on her latest generation mobile, before her sleep cycles are analyzed to produce easy-to-handle quality indicators, like an overall score, and thus encourage fun and regular monitoring of her sleep. It is her best night’s sleep for a while and she hurries to share her results by text with her best friend, and then on social media via Facebook and Twitter. In this world of connected health, congratulatory messages flood in hailing her “performance”! During her shower, online music streaming services such as Spotify or Deezer suggest a “wake-up” playlist, put together from the preferences and comments of thousands of users. She can give feedback on any of the songs for the software to adapt the upcoming songs in real time, with the help of a powerful recommendation system based on historical data. She enjoys her breakfast and is getting ready to go to work when the public transport Twitter account she subscribes to warns her of an incident causing serious disruption on the transport network. Hence, she decides to tackle the morning traffic by car, hoping to avoid arriving at work too late. To help her plan her route, she connects to a traffic information and community navigation app that obtains traffic information from GPS records generated by other drivers’ devices throughout their journeys to update a real-time traffic information map. Users can flag up specific incidents on the transport network themselves, and our heroine marks slow traffic caused by an accident. She decides to take the alternative route suggested by the app. Having arrived at work, she vents her frustration at a difficult day’s commute on social media. During her day at work, on top of her professional activity, she will be connected online to check her bank account balance and go shopping on a supermarket’s “drive” app that lets her do her shop online and pick it up later in her car. Her consumer profile on the online shopping app gives her a historical overview of the last few months, as well as suggesting products that are likely to interest her. On her way home, the trunk full with food, some street art painted on a wall immediately attracts her attention. She stops to take a photo, edits it with a color filter and shares it on a social network similar to Instagram. The photo immediately receives about 10 “likes”. That evening, a friend comments on the photo. Having recognized the artist, he gives her a link to an online video site like YouTube. The link is for a video of the street art being painted, put online by the artist to increase their visibility. She quickly watches it. Tired, she eats, plugs in her sleep app and goes to bed.

Between waking up and going to sleep, our heroine has generated a significant amount of data, a volume that it would have been difficult to imagine a few years earlier. With or without her knowledge, there have been hundreds of megabytes of data flow and digital records of her tastes, moods, desires, searches, location, etc. This homo sapiens, now homo numericus, is not alone – billions of us do the same. The figures are revealing and their growth astonishing: we have entered the era of big data. In 2016, one million links were shared, two million friend requests were made and three million messages were sent every 20 minutes on Facebook [STA 16a]. The figures are breathtaking:

– 1,540,000,000 users active at least once a month;

– 974,000,000 smartphone users;

– 12% growth in users between 2014 and 2015;

– 81 million Facebook profiles;

– 20 million applications installed on Facebook every day.

Since the start of computing, engineers and researchers have certainly been confronted with strong growth in data volumes, stored in larger and larger databases that have come to be known as data warehouses, and with ever improving architectures to guarantee high quality service. However, since the 2000s, mobile Internet and the Internet of Things, among other things, have brought about an explosion in data. This has been more or less well managed, requiring classical schemes to be reconsidered, both in terms of architecture and data processing. Internet traffic, computer backups on the cloud, shares on social networks, open data, purchase transactions, sensors and records from connected objects make up an assembly of markers in space and/or time of human activity, in all its dimensions. We produce enormous quantities of data and can produce it continuously wherever we are (the Internet is accessible from the office, home, airports, trains, cars, restaurants, etc.). In just a few clicks, you can, for example, describe and review a meal and send a photo of your dish. This great wealth of data certainly poses some questions, about ethics and security among other things, and also presents a great opportunity for society [BOY 12]. Uses of data that were previously hidden or reserved for an elite are becoming accessible to more and more people.

The same is true for the open data phenomenon establishing itself at all administrative scales. For big companies, and insurance companies in particular, there are multiple opportunities [CHE 12]. For example, data revealing driving styles are of interest to non-life insurance, and data concerning health and lifestyle are useful for life insurance. In both cases, knowing more about the person being insured allows better estimation of future risks. Storing this data requires a flexible and tailored architecture [ZIK 11] to allow parallel and dynamic processing of “voluminous”, “varied” data at “velocity” while evaluating its “veracity” in order to derive the great “value” of these new data flows [WU 14]. Big data, or megadata, is often presented in terms of these five Vs.

After initial reflection on the origin of the term and with a view to giving a reliable definition (section 1.2), we will return to the framework of these five Vs, which has the advantage of giving a pragmatic overview of the characteristics of big data (section 1.3). Section 1.4 will describe current architecture models capable of real-time processing of high-volume and varied data, using parallel and distributed processing. Finally, we will finish with a succinct presentation of some examples from the world of insurance.

1.2. How is big data defined?

“Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it.”

Dan Ariely

It is difficult to define a term as generic, widely used and even clichéd as big data. According to Wikipedia1:

“Big data is a term for datasets that are so large or complex that traditional data processing application software is inadequate to deal with them.”

This definition of the big data phenomenon presents an interesting point of view. It focuses on the loss of capability of classical tools to process such high volumes of data. This point of view was put forward in a report from the consulting firm McKinsey and Company that describes big data as data whose scale, distribution, diversity and transience require new architectures and analysis techniques that can unlock new sources of value added [MAN 11]. Of course, this point of view prevails today (in 2016, as these lines are being written) and a universal definition must use more generic characteristics that will stand the test of time. However, like many new concepts, there are as many definitions as there are authors on the subject. We refer the reader to [WAR 13] for an interesting discussion on this theme. To date the genesis of big data, why not make use of one of their greatest suppliers, the tech giant Google? Hence, we have extracted, with the help of the Google Trends tool, the growth in the number of searches for the term “big data” on the famous search engine. Figure 1.1 shows an almost exponential growth in the interest of people using the search engine from 2010 onwards, a sign of the youth of the term and perhaps a certain degree of surprise at a suddenly uncontrollable volume of data, as the Wikipedia definition, still relevant in 2016, suggests. However, articles have widely been using this concept since 1998, to relate a future development of data quantities and databases towards larger and larger scales [FAN 13, DIE 12]. The reference article, widely cited by the scientific community, dates from 2001 and is attributed to Doug Laney from the consultancy firm Gartner [LAN 01]. Curiously, the document never mentions the term big data, although it features the reference characterization of three Vs: volume, velocity and variety. “Volume” describes the size of the data, the term “velocity” captures the speed at which it is generated, communicated and must be processed, while the term “variety” refers to the heterogeneous nature of these new data flows. Most articles agree on the basic three Vs (see [FAN 13, FAN 14, CHE 14]), to which the fourth V of veracity (attributed to IBM [IBM 16]), as well as the fifth V, value, are added. The term “veracity” focuses on the reliability of the various data. Indeed, data can be erroneous, incomplete or too old for the intended analysis. The fifth V conveys the fact that data must above all create value for the companies involved, or society in general. In this respect, just as certain authors remind us that small volumes can create value (“small data also may lead to big value”, see [GU 14]), we should not forget that companies, through adopting practices suited to big data, must most of all store, process and create intelligent data. Perhaps we should be talking about smart data rather than big data?

1.3. Characterizing big data with the five Vs

In our initial assessment of the big data phenomenon, it should be noted that the 3 Vs framework of volume, velocity and variety, popularized by the research firm Gartner [LAN 01], is now standard. We will thus start with this classical scheme, shown in Figure 1.2, before considering other Vs, which will soon prove to be useful for developing this initial description.

Figure 1.1.Evolution of the interest in the term big data for Google searches (source: Google Trends, 27th September 2016)

Figure 1.2.The three Vs of big data

1.3.1. Variety

In a break with tradition, we will start by focusing on the variety, rather than volume, of data. We refer here to the different types of data available today. As we illustrated in the introduction, data originates everywhere, for example:

– texts, photos and videos (Internet, etc.);

– spatio-temporal information (mobile devices, smart sensors, etc.);

– metadata on telephone messages and calls (mobile devices, etc.);

– medical information (patient databases, smart objects, etc.);

– astronomical and geographical data (satellites, ground-based observatories, etc.);

– client data (client databases, sensors and networked objects, etc.).

The handful of examples listed above illustrate the heterogeneity of sources and data – “classical” data like that seen before the era of big data, evidently, and also video signals, audio signals, metadata, etc.

This diversity of content has brought about an initial paradigm shift from structured to non-structured data. In the past, much data could be considered to be structured in the sense that they could be stored in relational databases. This was how client or commercial data was stored. Today, a large proportion of data is not structured (photos, video sequences, account updates, social network statuses, conversations, sensor data, recordings, etc.).

1.3.2. Volume

If you ask a range of different people to define big data, most of them will bring up the concept of size, volume or quantity. Just close your eyes and imagine the amount of messages, photos and videos exchanged per second globally. In parallel to the developing interest for the concept of big data on the search engine Google (Figure 1.1), Internet usage has also exploded in just a few years, as the annual number of Google searches bears witness (Table 1.1).

The explosion in Internet usage, and in particular mobile Internet as made possible by smartphones and high-speed standards, has led to an unstoppable growth in data volumes, towards units that our oldest readers have surely recently discovered: gigabytes, terabytes, petabytes, exabytes and even zettabytes (a zettabyte is 1021 bytes!), as shown in Figure 1.3.

Table 1.1.Annual Google statistics [STA 16b]

Year

Annual number of searches

Average searches per day

2014

2,095,100,000,000

5,740,000,000

2013

2,161,530,000,000

5,922,000,000

2012

1,873,910,000,000

5,134,000,000

2011

1,722,071,000,000

4,717,000,000

2010

1,324,670,000,000

3,627,000,000

2009

953,700,000,000

2,610,000,000

2008

637,200,000,000

1,745,000,000

2007

438,000,000,000

1,200,000,000

2000

22,000,000,000

60,000,000

1998

3,600,000

9,800

Figure 1.3.Development of data volumes and their units of measure

According to an annual report on the Internet of Things [GSM 15], by the end of 2015, there were 7.2 million mobile connections, with projections for smartphones alone reaching more than 7 million in 2019. This expansive volume of data is what brought forth the big data phenomenon. With current data stores unable to absorb such growth in data volumes, companies, engineers and researchers have had to create new solutions, notably offering distributed storage and processing of these masses of data (see section 1.4). The places that store this data, the famous data centers, also raise significant questions in terms of energetic consumption. One report highlights the fact that data centers handling American data consumed 91 billion kWh of electricity in 2013, equivalent to the annual output of 34 large coal-fired power plants [DEL 14]. This figure is likely to reach 140 billion in 2020, equivalent to the annual output of 50 power plants, costing the American population $13 billion per year in electricity bills. If we add to this the emission of 100 million metric tons of CO2 per year, it is easy to see why large organizations have very quickly started taking this problem seriously, as demonstrated by the frequent installation of data centers in cold regions around the world, with ingenious systems for recycling natural energy [EUD 16].

1.3.3. Velocity

The last of the three historic Vs, the V for velocity, represents what would probably more naturally be called speed. It also covers multiple components, and it is intrinsic to the big data phenomenon. This is clear from the figures above regarding the development of the concept and volume of data, like a film in fast-forward. Speed can refer to the speed at which the data are generated, the speed at which they are transmitted and processed, and also the speed at which they can change form, provide value and, of course, disappear. Today, we must confront large waves of masses of data that must be processed in real time. This online-processed data allow decision makers to make strategic choices that they would not have even been aware of in the past.

1.3.4. Towards the five Vs: veracity and value

An enriched definition of big data quickly took shape with the appearance of a fourth element, the V of veracity, attributed to IBM [IBM 16]. The word veracity brings us back to the quality of the data, a vital property for all data search processes. Again, this concept covers different aspects, such as imprecision, incompleteness, inconsistency and uncertainty. According to IBM, poor data quality costs on average $3.1 trillion per year. The firm adds that 27% of questionnaire respondents are not sure of the information that they input and that one in three decision makers have doubts concerning the data they base their decision on. Indeed, the variety of data flows, which are often unstructured, complicates the process of certifying data. This brings to mind, for example, the quality of data on the social network Twitter, whose imposed 140 character format does not lend itself to precise prose that can be easily identified by automatic natural language processing tools. Certifying data is a prerequisite for creating value, which constitutes the fifth V that is well established in modern practices. The capacity to store, understand and analyze these new waves of high-volume, high-velocity, varied data, and to ensure reliability while integrating them into a business intelligence ecosystem, will undoubtedly allow all companies to put in place new decision advice modules (for example, predictive analysis) with high added value. One striking example concerns American sport and ticket sales that are currently based on dynamic pricing methods enhanced by historical and real-time data. Like many other American sports teams, the San Francisco Giants baseball team has thus adapted its match ticketing system to make use of big data. They engaged the services of the company QCUE to set up algorithmic trading techniques inspired by airline companies. The ticket prices are updated in real time as a function of supply and demand. In particular, historical data on the quality of matches and attendances are used to adjust ticket prices to optimize seat/stadium occupation and the company’s profits. On their website, QCUE report potential profit growth of up to 46% compared to the previous system.

Globally, big data represents a lucrative business. The McKinsey Institute has suggested that even the simple use of client location data could yield a potential annual consumer surplus of $600 billion [MAN 11]. The consulting group Wikibon estimates that the big data market, encompassing hardware, software and related services, will grow from $19.6 billion in 2013 to $84 billion in 2026 [KEL 15].

1.3.5. Other possible Vs