Unstructured Data Analytics - Jean Paul Isson - E-Book

Unstructured Data Analytics E-Book

Jean-Paul Isson

0,0
32,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Turn unstructured data into valuable business insight Unstructured Data Analytics provides an accessible, non-technical introduction to the analysis of unstructured data. Written by global experts in the analytics space, this book presents unstructured data analysis (UDA) concepts in a practical way, highlighting the broad scope of applications across industries, companies, and business functions. The discussion covers key aspects of UDA implementation, beginning with an explanation of the data and the information it provides, then moving into a holistic framework for implementation. Case studies show how real-world companies are leveraging UDA in security and customer management, and provide clear examples of both traditional business applications and newer, more innovative practices. Roughly 80 percent of today's data is unstructured in the form of emails, chats, social media, audio, and video. These data assets contain a wealth of valuable information that can be used to great advantage, but accessing that data in a meaningful way remains a challenge for many companies. This book provides the baseline knowledge and the practical understanding companies need to put this data to work. Supported by research with several industry leaders and packed with frontline stories from leading organizations such as Google, Amazon, Spotify, LinkedIn, Pfizer Manulife, AXA, Monster Worldwide, Under Armour, the Houston Rockets, DELL, IBM, and SAS Institute, this book provide a framework for building and implementing a successful UDA center of excellence. You will learn: * How to increase Customer Acquisition and Customer Retention with UDA * The Power of UDA for Fraud Detection and Prevention * The Power of UDA in Human Capital Management & Human Resource * The Power of UDA in Health Care and Medical Research * The Power of UDA in National Security * The Power of UDA in Legal Services * The Power of UDA for product development * The Power of UDA in Sports * The future of UDA From small businesses to large multinational organizations, unstructured data provides the opportunity to gain consumer information straight from the source. Data is only as valuable as it is useful, and a robust, effective UDA strategy is the first step toward gaining the full advantage. Unstructured Data Analytics lays this space open for examination, and provides a solid framework for beginning meaningful analysis.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 563

Veröffentlichungsjahr: 2018

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

Title Page

Foreword

Preface

Acknowledgments

CHAPTER 1: The Age of Advanced Business Analytics

INTRODUCTION

WHY THE ANALYTICS HYPE TODAY?

A SHORT HISTORY OF DATA ANALYTICS

WHAT IS THE ANALYTICS AGE?

NOTES

FURTHER READING

CHAPTER 2: Unstructured Data Analytics

INTRODUCTION

WHAT IS UDA?

WHY UDA TODAY?

THE UDA INDUSTRY

USES OF UDA

HOW UDA WORKS

WHY UDA IS THE NEXT ANALYTICAL FRONTIER?

UDA SUCCESS STORIES

THE GOLDEN AGE OF UDA

NOTES

FURTHER READING

CHAPTER 3: The Framework to Put UDA to Work

INTRODUCTION

WHY HAVE A FRAMEWORK TO ANALYZE UNSTRUCTURED DATA?

THE IMPACT CYCLE APPLIED TO UNSTRUCTURED DATA

TEXT PARSING EXAMPLE

CASE STUDY

NOTES

FURTHER READING

CHAPTER 4: How to Increase Customer Acquisition and Retention with UDA

THE VOICE OF THE CUSTOMER: A GOLDMINE FOR UNDERSTANDING CUSTOMERS

WHY SHOULD YOU CARE ABOUT UDA FOR CUSTOMER ACQUISITION AND RETENTION?

PREDICTIVE MODELS AND ONLINE MARKETING

HOW DOES UDA APPLIED TO CUSTOMER ACQUISITION WORK?

THE POWER OF UDA FOR E-MAIL RESPONSE AND AD OPTIMIZATION

HOW TO DRIVE MORE CONVERSION AND ENGAGEMENT WITH UDA APPLIED TO CONTENT

HOW UDA APPLIED TO CUSTOMER RETENTION (CHURN) WORKS

WHAT IS UDA APPLIED TO CUSTOMER ACQUISITION?

WHAT IS UDA APPLIED TO CUSTOMER RETENTION (CHURN)?

THE POWER OF UDA POWERED BY VIRTUAL AGENT

BENEFITS OF A VIRTUAL AGENT OR AI ASSISTANT FOR CUSTOMER EXPERIENCE

BENEFITS AND CASE STUDIES

APPLYING UDA TO YOUR SOCIAL MEDIA PRESENCE AND NATIVE ADS TO INCREASE ACQUISITIONS

NOTES

CHAPTER 5: The Power of UDA to Improve Fraud Detection and Prevention

INTRODUCTION

WHY SHOULD YOU CARE ABOUT UDA FOR FRAUD DETECTION AND PREVENTION?

BENEFITS OF UDA

WHAT IS UDA FOR FRAUD?

HOW UDA WORKS IN FRAUD DETECTION AND PREVENTION

UDA FRAMEWORK FOR FRAUD DETECTION AND PREVENTION: INSURANCE

MAJOR FRAUD DETECTION AND PREVENTION TECHNIQUES

BEST PRACTICES USING UDA FOR FRAUD DETECTION AND PREVENTION

NOTES

FURTHER READING

CHAPTER 6: The Power of UDA in Human Capital Management

WHY SHOULD YOU CARE ABOUT UDA IN HUMAN RESOURCES?

WHAT IS UDA IN HR?

WHAT IS UDA IN HR REALLY ABOUT?

THE POWER OF UDA IN ONLINE RECRUITMENT: SUPPLY AND DEMAND EQUATION

THE POWER OF UDA IN TALENT SOURCING ANALYTICS

THE POWER OF UDA IN TALENT ACQUISITION ANALYTICS

ARTIFICIAL INTELLIGENCE AS A HIRING ASSISTANT

THE POWER OF UDA IN TALENT RETENTION

EMPLOYEE PERFORMANCE APPRAISAL DATA REVIEW FEEDBACK

HOW UDA WORKS

BENEFITS OF UDA IN HR

CASE STUDIES

FURTHER READING

CHAPTER 7: The Power of UDA in the Legal Industry

WHY SHOULD YOU CARE ABOUT UDA IN LEGAL SERVICES?

WHAT IS UDA APPLIED TO LEGAL SERVICES?

HOW DOES IT WORK?

BENEFITS AND CHALLENGES

NOTES

FURTHER READING

CHAPTER 8: The Power of UDA in Healthcare and Medical Research

WHY SHOULD YOU CARE ABOUT UDA IN HEALTHCARE?

WHAT'S UDA IN HEALTHCARE?

HOW UDA WORKS

BENEFITS

CASE STUDY

NOTES

FURTHER READING

CHAPTER 9: The Power of UDA in Product and Service Development

WHY SHOULD YOU CARE ABOUT UDA FOR PRODUCT AND SERVICE DEVELOPMENT?

UDA AND BIG DATA ANALYTICS

WHAT IS UDA APPLIED TO PRODUCT DEVELOPMENT?

HOW IS UDA APPLIED TO PRODUCT DEVELOPMENT?

HOW UDA APPLIED TO PRODUCT DEVELOPMENT WORKS

NOTES

CHAPTER 10: The Power of UDA in National Security

NATIONAL SECURITY: PLAYGROUND FOR UDA OR CIVIL LIBERTY THREAT?

WHAT IS UDA FOR NATIONAL SECURITY?

DATA SOURCES OF THE NSA

WHY UDA FOR NATIONAL SECURITY?

CASE STUDIES

HOW UDA WORKS

NOTES

FURTHER READING

CHAPTER 11: The Power of UDA in Sports

THE SHORT HISTORY OF SPORTS ANALYTICS:

MONEYBALL

WHY SHOULD YOU CARE ABOUT UDA IN SPORTS?

WHAT IS UDA IN SPORTS?

HOW IT WORKS

NOTES

FURTHER READING

CHAPTER 12: The Future of Analytics

HARNESSING THESE EVOLVING TECHNOLOGIES WILL GENERATE BENEFITS

DATA BECOMES LESS VALUABLE AND ANALYTICS BECOMES MAINSTREAM

PREDICTIVE ANALYTICS, AI, MACHINE LEARNING, AND DEEP LEARNING BECOME THE NEW STANDARD

PEOPLE ANALYTICS BECOMES A STANDARD DEPARTMENT IN BUSINESSES

UDA BECOMES MORE PREVALENT IN CORPORATIONS AND BUSINESSES

COGNITIVE ANALYTICS EXPANSION

THE INTERNET OF THINGS EVOLVES TO THE ANALYTICS OF THINGS

MOOCS AND OPEN SOURCE SOFTWARE AND APPLICATIONS WILL CONTINUE TO EXPLODE

BLOCKCHAIN AND ANALYTICS WILL SOLVE SOCIAL PROBLEMS

HUMAN-CENTERED COMPUTING WILL BE NORMALIZED

DATA GOVERNANCE AND DATA SECURITY WILL REMAIN THE NUMBER-ONE RISK AND THREAT

NOTES

FURTHER READING

APPENDIX A: Tech Corner Details

SINGULAR VALUE DECOMPOSITION (SVD) ALGORITHM AND APPLICATIONS

PRINCIPAL COMPONENT ANALYSIS (PCA) AND APPLICATIONS

PCA APPLICATION TO FACIAL RECOGNITION: EIGENFACES

QR FACTORIZATION ALGORITHM AND APPLICATIONS

NOTE

FURTHER READING

About The Author

Index

End User License Agreement

List of Exhibits

Chapter 1

Exhibit 1.01 The Exponential Growth of New Information: The Five Layers

Exhibit 1.02 Data Production Evolution

Exhibit 1.04 Analytics Evolution

Chapter 2

Exhibit 2.02 AI, Machine Learning, Deep Learning, and Cognitive Computing/Analytics Overview

Exhibit 2.03 AI, Machine Learning, and Deep Learning Explained

Exhibit 2.04 Venn Diagram of AI, Machine Learning, Representation Learning, and Deep Learning

Exhibit 2.06 UDA Process

Exhibit 2.07 Text Analytics Applied to Resume Search: Boolean vs. Semantic Search Results of “Software Developer” Job Title (Output from a resumes database in medium Designated Market Area in Texas)

Exhibit 2.08 The Race for AI: Google, Twitter, Intel, Apple In A Rush To Grab Artificial Intelligence Start-ups

Chapter 3

Exhibit 3.01 The IMPACT Cycle

Exhibit 3.03 The Five Steps of Text Analysis

Exhibit 3.07 Singular Value Decomposition

Exhibit 3.15 Passenger Feedback Mining (Topic Mining Output)

Exhibit 3.17 Term, Document, and Corpus

Chapter 4

Exhibit 4.01 The Voice of the Customer (VoC)

Exhibit 4.02 The VoC for Customer Acquisition and Retention

Exhibit 4.03 Leveraging UDA to Increase Customer Acquisition

Exhibit 4.04 Customer Acquisition Predictive Model/Conversion Model

Exhibit 4.06 Segmentation Grid Prospects

Exhibit 4.07 Customer Acquisition Insights Summary: Success Plays for Prospects

Exhibit 4.10 UDA Leveraging UDA to Increase Customer Retention

Exhibit 4.11 Customer Retention Predictive Model/Churn Model

Exhibit 4.13 Segmentation Grid Insights Retention

Exhibit 4.14 Customer Retention Insights Summary: Success Plays

Exhibit 4.15 Consumer Decision Journey

Exhibit 4.16 From Consumer Decision Journey to Employer Decision Journey

Exhibit 4.18 Business-to-Business Scoring and Segmentation Using Different Types of Data

Exhibit 4.19 Churn Predictive Model Overview

Exhibit 4.20 Churn Predictive Model Process

Exhibit 4.21 Churn Predictive Model Behind the Scenes

Chapter 5

Exhibit 5.01 U.S. Fraud Victims and Fraud Losses

Exhibit 5.02 UDA Process for Fraud Detection and Prevention

Exhibit 5.04 Fraud Detection and Prediction Framework Using UDA

Exhibit 5.07 Forrester: Enterprise Fraud Solutions Providers

Chapter 7

Exhibit 7.01 Unstructured Legal Data Analytics Bridge

Exhibit 7.02 Text Analytics Process

Chapter 8

Exhibit 8.01 Computer Vision Error Rate

Chapter 9

Exhibit 9.01 Analytics and Innovation Intersection

Exhibit 9.02 The VoC Strategy, Based on Customer Experience

Exhibit 9.03 Architecture of Text Mining in Voices

Exhibit 9.04 VoC Program Paradigm

Chapter 10

Exhibit 10.01 Human Data for National Security Analytics

Exhibit 10.02 National Security Predictive Model

Chapter 11

Exhibit 11.02 Percentage of Teams That Employ Analytics Professionals

Exhibit 11.03 UDA Applied to Sports Data

Appendix A

Exhibit A.01 Normalized Data Representation with Eigenvectors V1, V2

Guide

Cover

Table of Contents

Begin Reading

Pages

C1

iii

iv

v

xiii

xiv

xv

xvi

xvii

xviii

xix

xx

xxi

xxii

xxiii

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

401

402

403

404

405

406

407

408

E1

Unstructured Data Analytics

How to Improve Customer Acquisition, Customer Retention, and Fraud Detection and Prevention

 

 

 

Jean Paul Isson

 

 

 

 

 

 

 

 

Copyright © 2018 by Jean Paul Isson. All rights reserved.

Published by John Wiley & Sons, Inc., Hoboken, New Jersey.

Published simultaneously in Canada.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750–8400, fax (978) 646–8600, or on the Web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748–6011, fax (201) 748–6008, or online at www.wiley.com/go/permissions.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762–2974, outside the United States at (317) 572–3993, or fax (317) 572–4002.

Wiley publishes in a variety of print and electronic formats and by print-on-demand. Some material included with standard print versions of this book may not be included in e-books or in print-on-demand. If this book refers to media such as a CD or DVD that is not included in the version you purchased, you may download this material at http://booksupport.wiley.com. For more information about Wiley products, visit www.wiley.com.

Library of Congress Cataloging-in-Publication Data is Available:

ISBN 9781119129752 (Hardcover)

ISBN 9781119325505 (ePDF)

ISBN 9781119325499 (ePub)

Cover Design: Wiley

Cover Images: Polygonal background © Hamster3d/iStockphoto; Abstract Background © imagy/iStockphoto

I dedicate this book to my daughters Roxane and Sofia who are my inspiring muse for writing books and for so many great things I do. Daddy is finished writing and is no longer hiding in the basement sitting in front of his computer. I hope when you will read this book, you will be proud of your patience and understanding.

A special Thanks to my wife Marjolaine for her love and support adjusting family's activities to accommodate my writing schedule.

Foreword

Undoubtedly, we've all seen some infographic showcasing how the amount of data in our digitally connected world, if represented as books, would circle the globe a number of times. We've all heard the mind-boggling amount of data created every minute of every day. Each minute generates ∼456,000 tweets, ∼46,000 Uber trips, ∼4,150,000 YouTube videos watched, and more. How did we get here? It's simple: in 1964, 1TB of memory would have cost about $3.5 billion. Today? $27…and it fits into your shirt pocket. (Keep in mind all of these numbers were out of date the minute they were stated.)

Now consider the impact of technologies such as blockchain and the Internet of Things (IoT) on the velocity of data that's currently being collected. Data collection rates are set to go into turbo mode (if they aren't already there). If I was to grade the world on data collection, I'd give it an A+. 24/7, the world stores more and more data. Nice job, world!

What about grading the world on how much of that data it understands? C− (at best). Why? Most of the world's data is unstructured—in other words, it doesn't fit nicely into the rows and columns upon which most analytics is performed. Add to this the fact that most of the world's data can't even be “Googled” (which means your company is stuck with it)…and opportunity has been missed.

The growing importance of unstructured data is evidenced by the recognition of Big Data industry leaders and academics that unstructured data has become a critical aspect of developing and using all the types of data intelligences that organizations, governments, and businesses have at their disposal. Efficiently harnessing unstructured data helps them to better manage their entire data asset and create additional value to compete and win.

So, regardless of the industry you are in, if you are struggling to move beyond unstructured data collection or basic reporting and want to create actionable intelligence from your internal and external unstructured data, this book will help you achieve that goal. It will help you capture the value hidden in the abyss and open up new opportunities for top-line growth, better customer acquisition and retention, better fraud detection and prevention, better safety, better product and service, better healthcare services (saving and improving lives), and so on.

Since unstructured data is the dominant type in the data collection rates for which an A+ grade was assigned, this book serves as a magnificent guide, helping you fully leverage your data and create actionable value for your organization.

Unstructured Data Analytics takes you through a framework for putting unstructured data to work in your organization (with examples across a multitude of industries and cross-industry use cases that inspire “the art of the possible” thinking), and it's done in an easy-to-read manner. I found a number of embedded “Aha” moments throughout the book that even seasoned analytic professionals will benefit from, a tribute to the talent and experience that Jean Paul Isson brings to all of his work.

Unstructured data is the lift, shift, rift, or cliff for any business today. Those who take the right path will find themselves with top-line growth, as disruptors with new business models to profit from, with a sound defense mechanism (data) against would-be disruptors. Unstructured Data Analytics is a must-read for not just data professionals, but businesspeople in general, across all industries. I hope you enjoy the read as much as I did.

Paul Zikopoulos, IBM Vice President Cognitive Systems@BigData_paulz

Preface

Over the past few years, we've witnessed a lot of hype and blandishment around Big Data, the Internet of Things, and artificial intelligence such as machine learning, deep learning, and cognitive analytics. The news constantly showcases how technological and analytical advancements change our lives and the way we do business. Transforming Big Data and the Internet of Things into actionable intelligence is the next frontier of innovation in this augmented age of analytics and is a top priority for leading data-driven executives.

Have you ever wondered how job boards like Monster match resumes to job descriptions without relying on an army of resources? How Google understands text in queries to provide pertinent articles, videos, images, and the like? How Spotify recommends a personalized Weekly Playlist that matches your musical tastes? How Twitter or Facebook derive user sentiment from millions of daily tweets and updates? How companies leverage the voices of their customers to improve client acquisition and retention or to develop new products? How IBM Watson assists physicians in diagnosing some diseases and recommends the best treatment? How national security organizations tap into billions of online communications to detect and prevent threats? How machines can find legal information to assist lawyers to defend cases? If your answer is “yes,” then unstructured data analytics (UDA) is the door to open for clues.

This book is about the analysis of unstructured data. Unstructured data does not fit a conventional relational database. It flows in a variety of formats—text, voice, images, bitmap, audio, and video—at an unprecedented speed and volume. It is propelled by the underpinning growth in technology and advances in computing power. Unstructured data makes up more than 80 percent of Big Data.

Today, companies can capture and store a lot of data, structured and unstructured, thanks to the low cost of storage. Many companies have done a great job in analyzing traditional structured data. However, they struggle to find actionable knowledge buried in texts, voice recordings, audio, video, images, and pictures. They consider themselves data rich but insight poor. In a worldwide competitive environment, executives are under extreme pressure to analyze this new type of data to create actionable business value and differentiate their company from the competition.

From increasing strategies to acquire or retain customer or talent to improving fraud detection and prevention, from human resources and human capital management, legal services, sports performance, healthcare and medical research, national security, and social media management, to product and services development, UDA will make a tangible impact across all industries and fields.

I wrote this book to be different from available books on text analytics, and I am glad you have chosen to read it. The motivation for writing this book comes, in part, from key learning from my previous books on data analytics and people analytics. This book fills a gap of knowledge on how to deal with unstructured data coming from Big Data. During my interviews for my previous books, Win with Advanced Business Analytics and People Analytics in the Era of Big Data, I learned from the leaders with whom I spoke that the biggest hurdle in the era of Big Data was taming unstructured data. Most leaders were convinced this new type of data had huge potential, but they didn't know where to start. Since most books do not really provide business-oriented insights and frontline stories while covering this topic, I decided to address the gap the executives felt by providing a how-to framework that any leader could use.

The concepts and approaches presented here are a combination of my own practical analytical leadership experiences in more than 50 countries and insights I gathered throughout my research. As such, the book is not written in an ivory tower. I have been through what I write in this book, have overcome the pitfalls I outline, and have created successful analytical solutions for a wide variety of global business solutions.

This book focuses on practical approaches that can help businesses create value from their data and make the most of the organization's analytical assets. Also, unlike other books, I outline how unstructured data analytics can be applied in several industries, along with an actionable framework and some frontline case studies from industry leaders in different organizations. What is working and what should you avoid? This book will help you evaluate the effectiveness of your UDA strategy and implementation, and will provide you with a framework to take your efforts to the next level, creating business value for your organization in the process.

This book is for leaders who want to learn about the power of UDA to optimize their data analytics strategy and practices. It provides an easy-to-implement framework and real-world case studies and interviews from leading organizations that are harnessing their structured and unstructured data to address their key business challenges. Managers and graduate students will also find the book useful, as it provides tactical information to support their roles. Although this is not a technical book, it is written to be relevant to someone with no analytical experience as well as to the person with a great deal of analytical experience. I have included an appendix section called Tech Corner Details to provide some fundamental techniques from linear algebra that are commonly used in UDA, such as text analysis and image analysis. Of course, there is a lot of software that performs text analytics and natural language processing, but I believe it is worth looking under the hood to better understand how things work.

I recommend reading the book from cover to cover; however, every chapter can stand on its own. If you are interested in using UDA to detect and prevent fraud, you can jump right to Chapter 5 (The Power of UDA to Improve Fraud Detection and Prevention). Likewise, if you are interested in understanding how NBA teams like the Houston Rockets use analytics to drive team performance, you can go to Chapter 11 (The Power of UDA in Sports). And if you are interested in understanding how you can improve your product development, you can Go to Chapter 9 (The Power of UDA in Product and Service Development). However, you will get the most from the book if you first read Chapters 1 (The Age of Advanced Business Analytics) through 3 (The Framework to Put UDA to Work). Readers not interested in understanding some technical details can skip the appendix A section.

To support the UDA framework proposed, I have included interviews and frontline stories from leading UDA organizations including: IBM, SAS Institute, Manulife, AXA, Monster Worldwide, LinkedIn, Google, Amazon, Facebook, Spotify, Pfizer, Under Armour, The Container Store, Dell, JPMorgan Chase & Co, Citi, Toronto Raptors, and the Houston Rockets.

Over a 22-month period of research for this book, I spoke to 253 business leaders across several industries and regions. Many of these interviews are found in the book, and for the majority of them, I gathered my interviewees' input to validate the framework I propose and to underscore what is needed to leverage the untapped data. I engaged with leaders from businesses in several industries that are using UDA solutions, as well as with leading solutions providers to understand why they invested resources in developing UDA tools and solutions and, more important, what they foresee as the Next Big Thing.

The accuracy of text analysis to identify emotion or sentiment still sparks heated debate. Don't wait to get the perfect solution. With this framework, you can start your UDA journey today to stay current and remain competitive while you learn and improve based on the data you have. You will be able to create business value from both your structured and your unstructured data to ultimately compete and win.

Acknowledgments

I engaged with hundreds of leaders to write this book, whether through interviews, formal contribution, or informal collaboration. I am indebted to many for helping me to complete this book.

I would like to thank Elise Amyot, who reviewed all chapters and case studies contained in this book. Elise's insightful, constructive, and invaluable feedback helped make the book easier to understand for all readers.

I would also like to thank Kim Lascelles, who reviewed my two previous books and helped by reviewing the first proposal of this book as well as some early chapters. Kim is a key pillar for the writing of my books.

Writing a book with a full-time job is a big undertaking. It would not have been possible without the support and input of other professionals, the hundreds of industry leaders and experts who generously participated in my research and shared their unstructured data analytics (UDA) journeys. You will see contributions and great insights throughout the book, in the form of quotes and concrete examples of how they make UDA work. I would like to thank all of them. The following list identifies many to whom I am indebted:

Foreword to the Book

Paul Zikopoulos, Vice President Big Data and Cognitive Systems—IBM

Interviews and Case Studies

Wayne Thompson, Chief Data Scientist, SAS Data Science Technologies—SAS

Fiona McNeill, Global Technology Product Marketing Manager—SAS

Suzanne Sprajcar Beldyck, Head of Content and Communications—SAS Canada

Heather Johnson, Manager, Analytics, Assets, Market Development and Insights—IBM

Eugen Wen, Vice President, Group Advanced Analytics—Manulife. Thanks a lot for your continuing support.

Stephani E. Kingsmill, Executive Vice President, Human Resources—Manulife Financial

Cindy Forbes, Executive Vice President and Chief Analytics Officer—Manulife Financial

Vishwa Kolla, Assistant Vice President, Head of Advanced Analytics—John Hancock Financial Services

Winston Lin, Director, Strategy and Analytics—Houston Rockets

Tommy Balcetis, Director of Basketball Strategy and Analytics— Denver Nuggets

Keith Boyarsky Director of Analytics—Toronto Raptors

Yongzheng (Tiger) Zhang, Senior Manager, Data Analytics/Data Mining—LinkedIn

François Laviolette, Director, Laval University Big Data Research Centre—Université Laval

Seth Grimes, President and Principal Consultant—Alta Plana Corporation; Founding Chair—Sentiment Analysis Symposium

Greta Roberts, Chief Executive Officer and Cofounder—Talent Analytics, Corp.

Pasha Roberts, Chief Scientist and Cofounder—Talent Analytics, Corp.

Diane Deperrois, General Manager South-East & Oversea Regions—AXA France

Mateo Cavasotto, Cofounder and Chief Executive Officer—Emi Labs

Doug Klinger, Chief Executive Officer and Member, Board of Directors—Zelis Healthcare

Mark Stoever, Chief Executive Officer—2020 Inc. former CEO Monster Worldwide Inc.

Arun Chidambaram, Global Head Talent Analytics—Pfizer

Troy Barnette, Customer Engagement Executive SAP; former Sr Director Corporate Service Business Relationship Lead at Under Armour

Rell B. Robinson, Senior HRIS and HR Analytics Professional—Bloomberg

Ian Bailie, former: Senior Director, People Planning, Analytics, and Tools—CISCO; current: Cofounder—270 degrees

Dawn Klinghoffer, General Manager, HR Business Insights—Microsoft

Joanie Courtney, President and Chief Operating Officer, Professional Division—EmployBridge

Bruno Aziza, Chief Marketing Officer—AtScale

Louis Gagnon, Chief Executive Officer—MyBrainSolutions

Art Papas, Chief Executive Officer and Cofounder—Bullhorn

Gregory Piatetsky-Shapiro, President and Editor— KDnuggets

Ian O'Keefe, Managing Director, Global Head of Workforce Analytics— JPMorgan Chase & Co.

Vasu Nagalingam, Director, Personal Wealth & Retirement—Merrill Lynch

Hugues Bertin, Chief Financial and Risk Officer, head of Digital Transformation—BNP Parisbas Cardif

Stéphane Brutus, Ph.D., RBC Professor in Motivation and Employee Performance—Concordia University

Jennifer Priestley, Ph.D., Associate Dean, Graduate College—Kennesaw State University

Paul Mason, Senior Manager, Corporate Education—University of California, Berkeley

Chantal Czerednikow, Dentist—Centre for Innovation, Montreal Children's Hospital, McGill University

John Houston, Principal and US Data Science Offering Leader—Deloitte

Amel Arhab, Senior Business Analytics Leader and Strategist, Senior Manager—Deloitte Consulting

Abby Mehta, Ph.D, Senior Vice President, Marketing Insights and Media Analytics—Bank of America

James Gallman, People Analytics Director—Boeing

Fanta Berete, Responsable Pôle Devp. Ressources Humaines et Formation—CCI France

Bryan Richardson, Global Practice Manager, Risk Advanced Analytics—McKinsey & Company

Paul Fletcher, Senior Vice President Marketing—Aviva Canada

Edmund Balogun, Project Director—IQPC

Eric Siegel, Founder—Predictive Analytics World Conference Series

All members of APM (Association pour le Progres du Management), a French CEO association with more than 6,500 members. Thanks for your useful insights, contribution with interviews, and business challenges you provided during our executive workshops that inspired some adjustments to the book.

Thanks to the members of the Monster Worldwide team:

Matt O'Connor, John McLaughlin, Claude Yama, Meredith Hanrahan, Pat Manzo, Javid Muhammedali, Steve LeClair, and the Monster product team for their support and assistance with some content for this book.

I am also grateful for the support of the Manulife executive team, who found some time for my interviews despite their busy schedule. And special thanks to Eugene Wen for his continuing support and insightful inputs that help to reshape some parts of this book. And Angela Costa for helping to set up all interviews with some executives at Manulife. And Eugene Wen-Eugene your support and insightful inputs were extremely invaluable and helped to reshape some parts of this book.

I would like to thank my friends for their support: Nathalie de Repentigny for the continuing support for all of my major projects, Ryan Jung, Alejandro Del Moral, Eugene Wen, Eric de Larminat(for your support to find creative solutions to successfully address unplanned challenges), Kim Vu, Oumar Mbaye, Alfonso Troisi, Karima Ahrab, Marc Bienstock, Ezana Raswork, Sean Dalton, Karim Salabi, Diawo Diallo, Eugene Robitaille, Mario Bottone, Mr and Mrs Guy, Siroux, and Guy François (my personal Abs coach at YMCA) and his colleague Rick Padmore.

I would also like to thank my family for their support: my parents, my mom Martha and my dad Samuel, for nurturing my passion for mathematics and teaching me the value of hard work at an early age; my brother Faustin (Moise), my sister Betty, my cousin David, my niece Marthe, my aunt Marie Christine, and my uncle Jackson (Isaac).

Special thanks to my two daughters, Roxane and Sofia, and my wife Marjolaine for their patience and understanding while I was robbing family time to research and write this book, hiding in the basement. Your love and support are key pillars to this achievement.

CHAPTER 1The Age of Advanced Business Analytics

In God we trust; all others must bring data.

—Attributed to W. Edwards Deming

INTRODUCTION

If you believe that the data analytics revolution has already happened, think again. After the steam engine, mass production, and Internet technology, the Internet of Things and artificial intelligence make up the fourth industrial revolution. However, the motor of this fourth revolution is analytics. The impact of analytics in our societies, our communities, and the business world has just begun. In fact, knowledge gained from analytics in the recent years has already reshaped the marketplace, changing the way we shop, buy, think, vote, hire, play, choose, date, and live. With more than 20 billion connected devices by 20201 and more than 5 billion people with IP addresses sharing information and intelligence by 2025, the pace of upcoming changes resulting from analytics is mind boggling. The Internet of Things was the genesis of the Analytics of Things and Analytics of Apps. In the Analytics of Things era, new knowledge, resulting from artificial intelligence, like machine learning, deep learning and cognitive analytics, and blockchain, will be gathered from complex information networks flowing from billions of connected devices and from humans interfacing with their intelligent devices such as machines, apps, and wearables of all kinds.

Today, companies are under extreme pressure to dig deeper into and connect all information and pieces of data at their disposal to find new differentiators from their competition. They need to better understand their market, customers, competitors, and talent pool. They also need to think laterally and look for creative ideas and innovations in other fields. Think of all the new information layers that are at their disposal to help them achieve this goal. Exhibit 1.01 showcases the five layers of new information that are being generated and helping organizations to create business value.

Exhibit 1.01 The Exponential Growth of New Information: The Five Layers

Exhibit 1.02 showcases the exponential growth of digital information. According to International Data Corporation (IDC) estimates, by 2020, data production will balloon to 44 times what it was in 2009.

Exhibit 1.02 Data Production Evolution

Source: CSC IDC Estimate

Customers and users have become more technology-savvy and are expecting more from their service providers or product manufacturers. They are very fickle and can switch to the competition with a single click. They have options and a lot of choices. They expect their service provider to know their habits, wants, and needs and even to anticipate their next move or transaction. Therefore, for any leading organizations poised to play in this digitized data economy where consumers and users are empowered, selective, and highly informed, analytics is key.

Analytics enables organizations of all sizes to meet and exceed customers' expectations by becoming data-driven to play in the new digital economy, where customers and market deep knowledge are front and center. However, to harness and create actionable insights from the complex user- and machine-generated data, organizations need to apply science to that data. Yes, science. In data science, specialists play with unorganized or “messy” data2 to distill the intelligence differentiator.

Data Scientist: The Sexiest Job of the Twenty-First Century

You probably came across this concept if you read Harvard Business Review in October 2012. In their article,3 Thomas H. Davenport and D. J. Patil provided a comprehensive overview of the prominent job occupation for Big Data tamers. They underscored the sudden rise of data science in the business scene.

For those of us with a math and computer science background, being labeled as sexy was a shock. When I was a graduate in mathematics and statistics, most of my class was made up of men. We were referred to as “those guys” or “those geeks,” and we were mostly left alone by others. Our program was perceived as difficult to understand or too theoretical. This made our field of study very hermetic and unappealing to many. At that time, none of us could see a real-life application resulting from linear algebra. How did all the theorems and concepts we were learning apply to the business world? Our ideal career pathways in those days were math teacher or researcher, not data scientist. The term was not sexy at that time.

Studying the theory of vector space along with the concept of singular value decomposition (SVD) in algebra, we manually decomposed a matrix into a product of three matrices. We understood SVD was useful in dimension reduction: It helped to transform any matrix with a universe of n vectors into a smaller number of vectors that contained all the information of the original complex matrix. For high-dimension matrices, the process was manually tedious, starting with finding eigenvalues and then building eigenvectors. While we became adept in SVD, we had no clue as to where this skill-set could be applied in real life. At that time, we didn't know that SVD would be used to analyze text. We didn't know that counting terms weights and using SVD dimension reduction would provide clean separation of customer opinion or text categorization and sentiment analysis. Now computers are proficient in running complex computations like SVD (see Chapter 3). It was when I entered the workforce that I finally came across the first application of SVD, embedded in a software program designed for text analytics. There it was, hidden in the technical section of the software: The theory of vectors space reduction power of SVD was leveraged. The text content was placed into a matrix, and a quantitative representation of the content matrix was then created. Eigenvalues and -vectors were found to reduce the content into a smaller number of vectors while retaining the essence of the original content. It turns out that the vectors space reduction and SVD I had learned in algebra were the foundations of most text analysis.

It is uncertain whether our professors had envisioned the real-life applications for the math theories they were teaching then. Only visionaries could have foreseen such an explosion of applications in our lives, today and in the future, given the new information layers I mentioned. Such foresight would have changed the evolution of the data science for sure. However, it is the explosion of technologies, the new information layers, and the empowerment of the consumers and users that have shed light on analytics and propelled it into a new science.

WHY THE ANALYTICS HYPE TODAY?

Analytics is now part of the bottom line of leading organizations and industries of all sizes and types. Some companies have used analytics to power growth and to move into new sectors. In today's global competitive landscape, where data never stops flowing and challenges and opportunities keep changing, a lot of hype surrounds advanced business analytics. Your company is probably constantly exhorted to build and implement strategies to collect, process, manage, and analyze data (big and small) and to create business value from it. And you are warned about the potential risk of falling behind your competition by not doing so.

The data scientist job title has been around for a few years; it was coined in 2008 by D.J. Patil and Jeff Hammebacher, data analytics leaders at (respectively) LinkedIn and Facebook. In fact, a lot of data scientists were working in leading companies long before the recent traction around data science. The rapid appearance of data analytics in the business arena came from Silicon Valley's online giant companies, such as Google, eBay, LinkedIn, Amazon, Facebook, and Yahoo!, all of which surged to prominence during the Y2K dot.com boom. These online firms started to amass and analyze the gigantic volume of data generated by the clickstreams generated through user searches. They pioneered the data economy by building data products, and Big Data invaded the business world. A new data economy was created; companies were inundated with information flowing from different sources, in varieties, volumes, velocity, and veracity that they wanted to harness to create actionable business value. The emergence of Big Data has triggered the overall traction around advanced business analytics, which was propelled by the consortium of the following pillars:

Costs to store and process information have reduced

Interactive devices and censors have increased

Data analytics infrastructures and software have increased

User-friendly and invisible data analytics tools have emerged

Data analytics have become mainstream, and it means a lot to our economy and world

Major leading tech companies have pioneered the data economy

Big Data analytics has become a big market opportunity

The number of data science university programs and MOOCs has intensified

1. Costs to Store and Process Information Have Reduced

The cost of storing information has significantly dropped: $600 will buy a disk drive that can store the entire world's music.4 In 1990, it cost $11,000 to buy 1 GB of disk space; today, 1 GB disk space costs less than 1 cent.5 Data storage and data processing have grown tremendously. The smartphones we all carry have more storage and computer power than a mainframe did in the 1980s. And large amounts of complex and unorganized data, such as audio, video, graphic, e-mail, chat, and text data, are widely available across infinite networks of computing environments at very low cost. In parallel, the cost of censors, photovoltaic cells, and all kinds of high-technology data-driven products has also decreased, enabling more people to use them—and generating more data to store and process.

2. Interactive Devices and Censors Have Increased

There are countless interactive devices such as smartphones, tablets sensors, surveillance cameras, and wearable devices that can capture, store, and send information, including everything we see, touch, hear, feel, and smell. This information no longer resides on an offline siloed desktop. It is instantly sent online via Internet Protocol (IP) and shared, transformed, and enriched by millions of users. Data is pumping in from a growing universe of connected sensors and machines.

3. Data Analytics Infrastructures and Software Have Increased

Over the past few years, there has been a rapid increase of data analytics tools and software that process and analyze data with unprecedented speed. Technologies such as artificial intelligence, machine learning, and deep learning are advancing more quickly than anyone anticipated.

The recent enthusiasm was propelled by the emergence of technology that makes taming Big Data possible, including Hadoop, the highly used framework for data file systems processing. Nowadays data scientists turn to Spark (and other tools) as a more efficient data processing framework. Open source cloud computing and data visualization capabilities are commonplace. A self-driving car, for example, can collect nearly 1 GB of data every second—it needs that granularity to find patterns to make reliable decisions. It also needs the computing power to make those decisions in a timely manner. As far as the computing power goes, we have moved from the central processing unit (CPU) to the graphics processing unit (GPU), and more recently, Google has developed the Tensor Processing Unit (TPU). During Google's 2017 I/O Conference, Google chief executive officer (CEO) Sundar Pichai discussed how Google is rethinking its computational architecture and building artificial intelligence and machine learning first data centers. He introduced Google's TPU, which supports Google's new artificial intelligence computational architecture, because, according to Google, TPUs are not only 15 to 30 times faster than CPUs and GPUs but also 30 to 80 percent more powerful. TPUs are now used across all Google products. In 2016, the TPU powered DeepMind AlphaGo during its historic 4–1 victory against Lee Sedol, the Go grandmaster. Machines can diagnose diseases, navigate roads, and understand and respond in natural language.

4. User-Friendly and Invisible Data Analytics Tools Have Emerged

There is an emergence of easy-to-use software that enables the processing and creation of actionable business value from the raw data. These user-friendly tools do not necessarily require technical background from the end user. Data analytics is becoming pervasive because the volume of data generated by embedded systems increases, and vast pools of structured and unstructured data inside and outside the enterprise are now being analyzed. According to David Cearley, vice president and Gartner fellow, “Every app now needs to be an analytics app.”6 Organizations need to manage how best to filter the huge amounts of data coming from the Internet of Things, social media, and wearable devices, and then deliver exactly the right information to the right person at the right time. He continues, “Analytics will become deeply, but invisibly embedded everywhere.” Big Data is the enabler and catalyst for this trend, but the focus needs to shift from Big Data to Big Analytics, because the value is in the answers, not the data.

5. Data Analytics Is Becoming Mainstream, and It Means a Lot to Our Economy and World

There has been a significant proliferation of analytics benefits from a variety of sectors and industries. Cases demonstrating how organizations and businesses are winning with analytics are more and more available and accessible to everyone. Marketing has been extremely powerful in sharing those success stories everywhere, from business magazines to traditional media. Depending on their level of analytical maturity and their most pressing business objectives, companies and organizations across different industries and business functions have embraced analytics to create business value and develop competitive differentiators. Following are examples where advanced analytics are impacting today's world:

Politics

Politicians have been leveraging data analytics to be more successful at winning elections.

Politicians use social media analytics to determine where they must campaign the hardest to win states, provinces, and cities in the next election.

During the 2012 presidential election, President Obama's team tactically leveraged Big Data analytics to target fundraising messaging to voters by location, age, group, occupation, and interest. For instance, according to their models, women from 35 to 49 years old were more likely to respond to messages such as “Dinner with the president and George Clooney.” Data-driven targeted fund activities helped the Obama team raise more than $1 billion (an all-time record in any presidential campaign in modern history). The team optimized their campaign ad spend and, more importantly, targeted undecided voters in swing states to register and persuaded them to vote for Obama.

Following the unexpected results of the 2016 election between Clinton and Trump, analysts attributed the discrepancy to five factors that traditional polling data could not capture: (1) A nontraditional candidate (no previous political history that resonated with traditional nonvoters); (2) An unprecedented investigation by the FBI during the final few days of the election; (3) The alleged influence of a foreign state; (4) The miscalculation in the distribution of voters (turnout) among different demographic groups; and (5) The impact of the “digital disruption of media,” which allowed so many new channels of conflicting information to spring up. The 2016 election is likely going to change how political analytics is conducted in the future.

It is important to note that the Trump campaign also leveraged Big Data insights from a U.K.-based Big Data analytics company called Cambridge Analytica, the same company that helped the Brexit leaders to win. Cambridge Analytica crunched data for Donald Trump, building psychological profiles of American voters. They leveraged more than 5,000 pieces of data about every adult American from sources such as personality tests on social media, voting history, personality type, shopping history, and television and cable viewing history. Insights from this analysis helped the Trump campaign tailor its message and strategy in battleground states and win.

Sport & Entertainment

Forward-looking companies use machine learning to analyze video content and identify key information aspects from it. Machine-learning technique is also helping them to uncover actionable insights from video content. These insights are helping their digital marketers to align their advertisements with the right videos and drive customer engagement.

Companies such as New York-based Dataminr, which provides a market monitoring platform for Brand, help businesses monitor and track their brand and immediately take appropriate actions. For instance, the platform can identify tweets related to stock trading and monitor discussion threads and communications to identify relevance and urgency. News organizations such as CNN and BBC use similar unstructured data analysis (UDA) technology to quickly identify news stories and have reporters deployed or engaged to cover them in a timely manner.

Video analytics and sensor data of tennis, baseball, and football games are used to improve the performances of players and teams. You can now buy a baseball with more than 200 sensors in it that will give you detailed feedback on how to improve your game.

Augmented reality technology is used to overlay supplemental information, such as scores and other statistics, over a live video field in sporting events to enhance the viewers' experience.

The Oakland Athletics (an American professional baseball team based in Oakland, California) pioneered analytics in sports, and now all teams use analytics. Today information and statistics about players are a prerequisite for recruitment.

Spotify uses unstructured data analytics to provide its popular Discover Weekly playlists, which are personalized to our musical tastes. The company leverages collaborative filtering, convolutional neural networks, and natural language processing to scan music blogs, build microgenres, analyze the contents of playlists, and detect and eliminate outliers to find songs that fit our profile, but that we haven't yet listened to.

Artists such as Lady Gaga are using data about our listening preferences and sequences to determine the most popular playlists for live performances.

Business

Amazon uses analytics to recommend what books to buy; more than 35 percent of their sales are generated from these recommendations: “People who bought this book also bought …”

Netflix leverages analytics to recommend movies you are more likely to watch. More than 70 percent of Netflix movie choices arise from its online recommendations.

Pinterest leverages UDA to provide a personalized experience on its home feed by surfacing content each user would be more interested in seeing.

Companies use sentiment analysis from Facebook and Twitter posts to determine and predict sales volume and brand equity.

Target (a large retailer) predicts when a pregnant woman is due based on products she purchases. The company simply combines her loyalty card data with social media information, hence detecting and leveraging changing buying patterns. As a result, the company can target pregnant women with promotions for baby-related products. The company increased revenue 15 to 20 percent by targeting direct mail with product choice models.

Google's self-driving car analyzes a gigantic amount of data from sensors and cameras in real time to stay safe on the road.

The global positioning system (GPS) in our phones provides location and speed information for live traffic updates. Location-based data has generated billions of dollars for companies and even more value for users.

Healthcare

Pediatric hospitals apply data analytics to livestreamed heartbeats to identify patterns. Based on the analysis, the system can now detect infections 24 hours before the baby would normally begin to show any symptoms, which enables early intervention and treatment.

After winning the TV show

Jeopardy

, Watson the IBM supercomputer now assists physicians to better diagnose and cure aggressive diseases such as cancers. Watson is being enlisted to help doctors predict cancer in patients. Watson has greatly speeded diagnosis of acute myeloid leukemia, a type of blood cancer, as well as glioblastoma multiform, an aggressive brain cancer.

DNA sequencing: It took 10 years to sequence the DNA of one person, at a cost of more than $100 million. Today, DNA sequencing is done very quickly for less than $99. This availability leads to personalized medicine and drug prescriptions. Genomic precision medicine is becoming a game changer.

Natera can predict Trisomy 21 disease without the risk of miscarriage from an amniocentesis test just by testing blood from the mother-to-be. The results of that blood test have a 99 percent accuracy rate, similar to those from the traditional amniocentesis test.

Government, Security, and Police

The FBI combines data from social media, closed-circuit television (CCTV) cameras, phone calls, and texts to track down criminals and predict terrorist attacks.

JPMorgan Chase & Co. invested $837 million in people analytics to anticipate rogue employee behavior that has cost the bank more $36 billion since the end of the 2008 financial crisis.

Human Capital Management

HR analytics companies are now using people analytics to optimize talent management. Big Data analytics help to attract, acquire, develop, and retain talent. Google, Cisco, Microsoft, GE, Xerox, Bloomberg, Deloitte, and Pfizer have been accumulating success stories and benefits from people analytics to optimize their talent equation.

6. Major Leading Tech Companies Have Pioneered the Data Economy

Major tech companies such as Google, Facebook, eBay, Amazon, LinkedIn, and Yahoo began to monetize their data. As a result, their entire businesses reside on their ability to harness the data at their disposal. Most recently, some of these companies shared Big Data analytics algorithms they pioneered with the public: open source, Hadoop, and Spark. As a result, entire businesses that stem from data analytics and raw data have been developed. A new digital economy was born, in which data is the currency and analytics sets the trading rules. Data analytics has become pervasive and omnipresent. Social media, cloud, mobile, and Big Data provide disruptive ways to capture, process, analyze, and communicate data analytics findings and recommendations.

7. Big Data Analytics Has Become a Big Market Opportunity

Big Data analytics has a big market opportunity: The research firm IDC forecast that the market opportunity for Big Data and business analytics software, hardware, and services will grow by 50 percent from 2015 to 2019. This means that the market size will reach 187 billion by 2019. Services accounting for the largest portion of the revenue with manufacturing and banking poised to lead the spend.7

8. The Number of Data Science University Programs and MOOCs Has Intensified

Before Big Data, there were no university data science programs or degrees. The need to harness the complex volume of so-called “messy” data such as images, text, audio, video, chat, graphics, and pictures forced universities to adapt. The article “Big Data: The Next Frontier for Innovation, Competition, and Productivity” from McKinsey&Company8 has played a key role in raising the awareness of the eminent data scientist labor shortage. According to the article, in 2018, there will be 140,000 to 190,000 data scientists and 1.5 million data-savvy managers. This headline has pulled the alarm signal, creating awareness among organizations eager to join the new data economy.

A variety of business schools and universities now offer data mining and data science programs and degrees. In the United States alone, there are more than 20 universities offering data science programs,9 as well as a panoply of executive programs covering data analytics either locally from universities' classrooms or via the Internet and videos. This has led to the multiplication of massive open online courses (MOOCs). MOOC distance education was first introduced in 2008 and emerged as a popular mode of learning in 2012,10 and offers a lot of data science modules and programs.

The top of the list includes MOOCs in Exhibit 1.3, organized based upon the date of inception.11

MOOC Name

Inception Date

Founders

MOC Summary/Value Proposition

Khan Academy

2006

Salman Khan

Khan Academy is a nonprofit educational organization that produces YouTube videos of short lectures on specific topics within many subject areas to supplement classroom learning. It uses online tools to provide meaningful educational content to students

Udemy

2010

Eren Bali, Octay Caglar, and Gagan Biyani

Udemy is an online learning platform that does not focus on traditional college courses. Instead it enables instructors to build online courses aimed at professional adults and offer those courses to the public.

Udacity

2011

Sebastian Thrun, David Stavens, and Mike Sokolsky

Udacity is a for-profit educational organization that originally offered traditional university courses, but in 2013, it shifted focus to vocational courses for adult professionals

edX

2012

Harvard University and the Massachusetts Institute of Technology (MIT)

edX provides interactive online classes in a wide number of subjects, ranging from biology, chemistry, and engineering to law, literature, and philosophy from partner organizations including university MOOC.

Coursera

2012

Andrew Ng and Daphne Koller (Stanford University computer professors)

Coursera partners with universities to offer online courses in a wide range of subjects. Its student users can even pursue specializations and degrees as well as a comprehensive list for data science programs.

Exhibit 1.03 MOOCs

Other MOOCs include Future Learn Open Education Europe and The Open University Classroom.

At Harvard University, more people signed on in one year to the online courses than had graduated from the university in its 377 years of existence.

In addition, communities' and analytics groups' niche sites have been proliferating. The ever-growing data volume has put more pressure on companies to hire data scientists. This job occupation has been a hard-to-fill position, a situation that is not poised to change. How did we get here? What is the timeline that led to the Analytics Age?

A SHORT HISTORY OF DATA ANALYTICS

Modern UDA has been 58 years in the making. In 1958, IBM engineer H. P. Luhn wrote an article in which he indicated that business intelligence is the analysis of structured and unstructured text documents. In that article, he defined business intelligence as a system that will:

utilize data-processing machines for auto-abstracting and auto-encoding of documents and for creating interest profiles for each of the “action points” in an organization. Both incoming and internally generated documents are automatically abstracted, characterized by a word pattern, and sent automatically to appropriate action points.12

However, the first representation of data in rows and columns can be traced back in the second century in Egypt! With today's fast evolution and revolution of technology (software and hardware), data mining, business intelligence, and UDA have evolved rapidly since 1958. Past unresolved analytics problems are now being addressed, thanks to the sophistication of tools and software. Consider Siri, Apple's voice recognition technology for the iPhone. Siri's origins go back to a Pentagon research project that was spun off as a Silicon Valley start-up. Apple bought Siri in 2010 and has been feeding it data ever since. Now, with people supplying millions of questions, Siri is becoming an increasingly adept personal assistant, offering reminders, weather reports, restaurant suggestions, and answers to an expanding universe of questions.

From a historical perspective, consider the following timeline, which outlines key milestones leading to the current state of UDA:

Second century

: In Egypt, the first table with data represented in rows and columns.

Seventeenth century

: The two-dimensional graph is invented by René Descartes (French philosopher and mathematician).

1756–1806

: The line graph, line chart, line bar chart, and pie chart were invented by William Playfair.

1856–1915

: Frederick Winslow is credited for applying engineering principles to factory work, which was instrumental in the creation and development of industrial engineering.

Eighteenth and nineteenth centuries

: Modern statistics emerged, with A. Fisher and Karl Spearman introducing factor analysis and principal components analysis (PCA). Spearman introduced PCA in 1901.

1958

: The first definition of

business intelligence