Predictive Analytics For Dummies - Anasse Bari - E-Book

Predictive Analytics For Dummies E-Book

Anasse Bari

3,0
22,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Use Big Data and technology to uncover real-world insights You don't need a time machine to predict the future. All it takes is a little knowledge and know-how, and Predictive Analytics For Dummies gets you there fast. With the help of this friendly guide, you'll discover the core of predictive analytics and get started putting it to use with readily available tools to collect and analyze data. In no time, you'll learn how to incorporate algorithms through data models, identify similarities and relationships in your data, and predict the future through data classification. Along the way, you'll develop a roadmap by preparing your data, creating goals, processing your data, and building a predictive model that will get you stakeholder buy-in. Big Data has taken the marketplace by storm, and companies are seeking qualified talent to quickly fill positions to analyze the massive amount of data that are being collected each day. If you want to get in on the action and either learn or deepen your understanding of how to use predictive analytics to find real relationships between what you know and what you want to know, everything you need is a page away! * Offers common use cases to help you get started * Covers details on modeling, k-means clustering, and more * Includes information on structuring your data * Provides tips on outlining business goals and approaches The future starts today with the help of Predictive Analytics For Dummies.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 665

Veröffentlichungsjahr: 2016

Bewertungen
3,0 (16 Bewertungen)
2
2
8
2
2
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Predictive Analytics For Dummies®, 2nd Edition

Published by: John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030-5774, www.wiley.com

Copyright © 2017 by John Wiley & Sons, Inc., Hoboken, New Jersey

Media and software compilation copyright © 2017 by John Wiley & Sons, Inc. All rights reserved.

Published simultaneously in Canada

No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without the prior written permission of the Publisher. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.

Trademarks: Wiley, For Dummies, the Dummies Man logo, Dummies.com, Making Everything Easier, and related trade dress are trademarks or registered trademarks of John Wiley & Sons, Inc. and may not be used without written permission. All trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book.

LIMIT OF LIABILITY/DISCLAIMER OF WARRANTY: THE PUBLISHER AND THE AUTHOR MAKE NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE ACCURACY OR COMPLETENESS OF THE CONTENTS OF THIS WORK AND SPECIFICALLY DISCLAIM ALL WARRANTIES, INCLUDING WITHOUT LIMITATION WARRANTIES OF FITNESS FOR A PARTICULAR PURPOSE. NO WARRANTY MAY BE CREATED OR EXTENDED BY SALES OR PROMOTIONAL MATERIALS. THE ADVICE AND STRATEGIES CONTAINED HEREIN MAY NOT BE SUITABLE FOR EVERY SITUATION. THIS WORK IS SOLD WITH THE UNDERSTANDING THAT THE PUBLISHER IS NOT ENGAGED IN RENDERING LEGAL, ACCOUNTING, OR OTHER PROFESSIONAL SERVICES. IF PROFESSIONAL ASSISTANCE IS REQUIRED, THE SERVICES OF A COMPETENT PROFESSIONAL PERSON SHOULD BE SOUGHT. NEITHER THE PUBLISHER NOR THE AUTHOR SHALL BE LIABLE FOR DAMAGES ARISING HEREFROM. THE FACT THAT AN ORGANIZATION OR WEBSITE IS REFERRED TO IN THIS WORK AS A CITATION AND/OR A POTENTIAL SOURCE OF FURTHER INFORMATION DOES NOT MEAN THAT THE AUTHOR OR THE PUBLISHER ENDORSES THE INFORMATION THE ORGANIZATION OR WEBSITE MAY PROVIDE OR RECOMMENDATIONS IT MAY MAKE. FURTHER, READERS SHOULD BE AWARE THAT INTERNET WEBSITES LISTED IN THIS WORK MAY HAVE CHANGED OR DISAPPEARED BETWEEN WHEN THIS WORK WAS WRITTEN AND WHEN IT IS READ.

For general information on our other products and services, please contact our Customer Care Department within the U.S. at 877-762-2974, outside the U.S. at 317-572-3993, or fax 317-572-4002. For technical support, please visit https://hub.wiley.com/community/support/dummies.

Wiley publishes in a variety of print and electronic formats and by print-on-demand. Some material included with standard print versions of this book may not be included in e-books or in print-on-demand. If this book refers to media such as a CD or DVD that is not included in the version you purchased, you may download this material at http://booksupport.wiley.com. For more information about Wiley products, visit www.wiley.com.

Library of Congress Control Number: 2016951998

ISBN 978-1-119-26700-3 (pbk); 978-1-119-26701-0 (epub); 978-1-119-26702-7 (epdf)

Predictive Analytics For Dummies®

To view this book's Cheat Sheet, simply go to www.dummies.com and search for “Predictive Analytics For Dummies Cheat Sheet” in the Search box.

Table of Contents

Cover

Introduction

About This Book

Foolish Assumptions

Icons Used in This Book

Beyond the Book

Where to Go from Here

Part 1: Getting Started with Predictive Analytics

Chapter 1: Entering the Arena

Exploring Predictive Analytics

Adding Business Value

Starting a Predictive Analytic Project

Ongoing Predictive Analytics

Forming Your Predictive Analytics Team

Surveying the Marketplace

Chapter 2: Predictive Analytics in the Wild

Online Marketing and Retail

Implementing a Recommender System

Target Marketing

Personalization

Content and Text Analytics

Chapter 3: Exploring Your Data Types and Associated Techniques

Recognizing Your Data Types

Identifying Data Categories

Generating Predictive Analytics

Connecting to Related Disciplines

Chapter 4: Complexities of Data

Finding Value in Your Data

Constantly Changing Data

Complexities in Searching Your Data

Differentiating Business Intelligence from Big-Data Analytics

Exploration of Raw Data

Part 2: Incorporating Algorithms in Your Models

Chapter 5: Applying Models

Modeling Data

Healthcare Analytics Case Studies

Social and Marketing Analytics Case Studies

Prognostics and its Relation to Predictive Analytics

The Rise of Open Data

Chapter 6: Identifying Similarities in Data

Explaining Data Clustering

Converting Raw Data into a Matrix

Identifying Groups in Your Data

Finding Associations in Data Items

Applying Biologically Inspired Clustering Techniques

Chapter 7: Predicting the Future Using Data Classification

Explaining Data Classification

Introducing Data Classification to Your Business

Exploring the Data-Classification Process

Using Data Classification to Predict the Future

Ensemble Methods to Boost Prediction Accuracy

Deep Learning

Part 3: Developing a Roadmap

Chapter 8: Convincing Your Management to Adopt Predictive Analytics

Making the Business Case

Gathering Support from Stakeholders

Presenting Your Proposal

Chapter 9: Preparing Data

Listing the Business Objectives

Processing Your Data

Working with Features

Structuring Your Data

Chapter 10: Building a Predictive Model

Getting Started

Developing and Testing the Model

Going Live with the Model

Chapter 11: Visualization of Analytical Results

Visualization as a Predictive Tool

Evaluating Your Visualization

Visualizing Your Model’s Analytical Results

Novel Visualization in Predictive Analytics

Big Data Visualization Tools

Part 4: Programming Predictive Analytics

Chapter 12: Creating Basic Prediction Examples

Installing the Software Packages

Preparing the Data

Making Predictions Using Classification Algorithms

Chapter 13: Creating Basic Examples of Unsupervised Predictions

Getting the Sample Dataset

Using Clustering Algorithms to Make Predictions

Chapter 14: Predictive Modeling with R

Programming in R

Making Predictions Using R

Chapter 15: Avoiding Analysis Traps

Data Challenges

Analysis Challenges

Part 5: Executing Big Data

Chapter 16: Targeting Big Data

Major Technological Trends in Predictive Analytics

Applying Open-Source Tools to Big Data

Chapter 17: Getting Ready for Enterprise Analytics

Analytics as a Service

Preparing for a Proof-of-Value of Predictive Analytics Prototype

Part 6: The Part of Tens

Chapter 18: Ten Reasons to Implement Predictive Analytics

Identifying Business Goals

Knowing Your Data

Organizing Your Data

Satisfying Your Customers

Reducing Operational Costs

Increasing Returns on Investments (ROI)

Gaining Rapid Access to Information

Making Informed Decisions

Gaining Competitive Edge

Improving the Business

Chapter 19: Ten Steps to Build a Predictive Analytic Model

Building a Predictive Analytics Team

Setting the Business Objectives

Preparing Your Data

Sampling Your Data

Avoiding “Garbage In, Garbage Out”

Creating Quick Victories

Fostering Change in Your Organization

Building Deployable Models

Evaluating Your Model

Updating Your Model

About the Authors

Connect with Dummies

End User License Agreement

Guide

Cover

Table of Contents

Begin Reading

Pages

i

ii

iii

iv

v

vi

vii

viii

1

2

3

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

399

400

401

402

403

404

405

406

407

408

409

410

411

412

413

414

415

416

417

418

419

420

421

422

423

424

425

426

427

428

429

430

431

432

445

446

447

448

Introduction

Predictive Analytics is the art and science of using data to make better informed decisions. Predictive analytics helps you uncover hidden patterns and relationships in your data that can help you predict with greater confidence what may happen in the future, and provide you with valuable, actionable insights for your organization.

About This Book

Our goal was to make this complex subject as practical as possible, in a way that appeals to everyone from technical experts to non-technical level business strategists.

The subject is complex because it is not really just one subject. It is the combination of at least a few multifaceted fields: data mining, statistics, and mathematics.

Data mining requires an understanding of machine learning and information retrieval. On top of this, mathematics and statistics must be applied to your business domain; be it marketing, actuary service, fraud, crime, or banking.

Most of the current materials on predictive analytics are pretty difficult to read if you don't already have a background in some of the aforementioned subjects. They are filled with complex mathematical equations and modeling techniques. Or, they are at a high level with specific use cases but with little guidance regarding implementation.  We include both, while trying to keep a wide spectrum of readers engaged.

The focus of this book is developing a roadmap for implementing predictive analytics within your organization. Its intended audience is the larger community of business managers, business analysts, data scientists, and information technology professionals.

Maybe you are a business manager and you have heard the buzz about predictive analytics. Maybe you've been working with data mining and you want to add predictive analytics to your skill set. Maybe you know R or Python, but you're totally new to predictive analytics. If this sounds like you, then this book will be a good fit. Even if you have no experience analyzing data, but want or need to derive greater value from your organization’s data, you can also find something of value in this book.

Foolish Assumptions

Without oversimplifying, we have tried to explain technical concepts in non-technical terms, tackling each topic from the ground up.

Even if you are an experienced practitioner, you should find something new, and at the very least, you will gain validation for what you already know, and guidance for establishing best practices.

We also hope to have contributed a few concepts and ideas for the very first time in a major publication like this. For example we explain how you can apply biologically inspired algorithms to predictive analytics.

We assume that the reader will not be a programmer. The code presented in this book is very brief and easy to follow. Readers of all programming levels will benefit from this book, because it is more about learning the process of predictive analytics rather than learning a programming language.

Icons Used in This Book

The following icons in the margins indicate highlighted material that we think could be of interest to you. Next, we describe the meaning of each icon that is used in this book.

The tips are ideas we would like you to take note of. This is usually practical advice you can apply for that given topic.

This icon is rarely used in this book. We may have used it only once or twice in the entire book. The intent is to save you time by bringing to your attention some common pitfalls that you are better off avoiding.

We have made sincere efforts to steer away from the technical stuff. But when we have no choice we make sure to let you know. So if you don’t care too much about the technical stuff you can easily skip this part and you won’t miss much. If the technical stuff is your thing, then you may find these sections fascinating.

This is something we would like you to take a special note of. This is a concept or idea we think is important for you know and remember. An example of this would be a best practice we think it is noteworthy.

Beyond the Book

A lot of extra content that is not in this book is available at www.dummies.com. Go online to find the following:

The Cheat Sheet for this book is at

www.dummies.com/cheatsheet/predictiveanalytics

Here you’ll find the necessary steps needed to build a predictive analytics model and some cases studies of predictive analytics.

Updates to this book, if we have any, are also available at

www.dummies.com/extras/predictiveanalytics

Where to Go from Here

Let’s start making some predictions! You can apply predictive analytics to virtually every business domain. Right now there is explosive growth in predictive analytics’ market, and this is just the beginning. The arena is wide open, and the possibilities are endless.

Part 1

Getting Started with Predictive Analytics

IN THIS PART …

Exploring predictive analytics

Identifying uses

Classifying data

Presenting information

Chapter 1

Entering the Arena

IN THIS CHAPTER

Explaining the building blocks

Probing capabilities

Surveying the market

Predictive analytics is a bright light bulb powered by your data.

You can never have too much insight. The more you see, the better the decisions you make — and you never want to be in the dark. You want to see what lies ahead, preferably before others do. It's like playing the game “Let's Make a Deal” where you have to choose the door with the hidden prize. Which door do you choose? Door 1, Door 2, or Door 3? They all look the same, so it's just your best guess — your choice depends on you and your luck. But what if you had an edge — the ability to see through the keyhole? Predictive analytics can give you that edge.

Starting a Predictive Analytic Project

For the moment, let's forget about algorithms and higher math; predictions are used in every aspect of our lives. Consider how many times you have said (or heard people say), “I told you that was going to happen.”

When you want to predict a future event with any accuracy, however, you'll need to know the past and understand the current situation. Doing so entails several processes:

Extract the facts that are currently happening.

Distinguish present facts from those that just happened.

Derive possible scenarios that could happen.

Rank the scenarios according to how likely they are to happen.

Predictive analytics can help you with each of these processes, so that you know as much as you can about what has happened and can make better-informed decisions about the future.

Companies typically create predictive analytics solutions by combining three ingredients:

Business knowledge

Data-science team and technology

The data

Though the proportion of the three ingredients will vary from one business to the next, all are required for a successful predictive analytic solution that yields actionable insights.

Business knowledge

Because any predictive analytics project is started to fulfill a business need, business-specific knowledge and a clear business objective are critical to its success. Ideas for a project can come from anyone within the organization, but it's up to the leadership team to set the business goals and get buy-in from the needed departments across the whole organization.

Be sure the decision-makers in your team are prepared to act. When you present a prototype of your project, it needs an in-house champion — someone who's going to push for its adoption.

The leadership team or domain experts must also set clear metrics — ways to quantify and measure the outcome of the project. Appropriate metrics keep the departments involved clear about what they need to do, how much they need to do, and whether what they're doing is helping the company achieve its business goals.

The business stakeholders are those who are most familiar with the domain of the business. They'll have ideas about which correlations — relationships between features — of data work and which don't, which variables are important to the model, and whether you should create new variables — as in derived features or attributes — to improve the model.

Business analysts and other domain experts can analyze and interpret the patterns discovered by the machines, making useful meaning out of the data patterns and deriving actionable insights.

This is an iterative (building a model and interpreting its findings) process between business and science. In the course of building a predictive model, you have to try successive versions of the model to improve how it works (which is what data experts mean when they say iterate the model over its lifecycle). You might go through a lot of revisions and repetitions before you can prove that your model is bringing real value to the business. Even after the predictive models are deployed, the business must monitor the results, validate the accuracy of the models and improve upon the models as more data is being collected.

Data-science team and technology

The technology used in predictive analytics will include at least some (if not all) of these capabilities:

Data mining

Statistics

Machine-learning algorithms

Software tools to build the model

The business people needn't understand the details of all the technology used or the math involved — but they should have a good handle on the process that model represents, and on how it integrates with the overall infrastructure of your organization. Remember, this is a collaborative process; the data scientists and business people must work closely together to build the model.

By the same token, providing a good general grasp of business knowledge to the data scientists gives them a better chance at creating an accurate predictive model, and helps them deploy the model much more quickly. After the model is deployed, the business can start evaluating the results right away — and the teams can start working on improving the model. Through testing, the teams will learn together what works and what doesn't.

The combination of business knowledge, data exploration, and technology leads to a successful deployment of the predictive model. So the overall approach is to develop the model through successive versions and make sure the team members have enough knowledge of both the business and the data science that everyone is on the same page.

Some analytical tools — specialized software products — are advanced enough that they require people with scientific backgrounds to use them; others are simple enough that any business person within the organization can use them. Selecting the right tool(s) is also a decision that must be taken very carefully. Every company will have different needs and not any one tool can address all those needs. But one thing is certain; every company will have to use some sort of tool to do predictive analytics.

Selecting the right software product for the job depends on such factors as

The cost of the product

The complexity of the business problem

The complexity of data

The source(s) of the data

The velocity of the data (the speed by which the data changes)

The people within the organization who will use the product

The Data

All else being equal, you'd expect a person who has more experience to be better at doing a job, playing a game, or whatever than someone who has less experience. That same thinking can be applied to an organization. If you imagine an organization as a person, you can view the organization's data as its equivalent of experience. By using that experience, you can make more insightful business decisions and operate with greater efficiency. Such is the process of turning data into business value with predictive analytics.

It's increasingly clear that data is a vital asset for driving the decision-making process quick, realistic answers and insights. Predictive analytics empower business decisions by uncovering opportunities such as emerging trends, markets, or customers before the competition.

Data can also present a few challenges in its raw form. It can be distributed across multiple sources, mix your own data with third-party data, and otherwise make the quality of incoming data too messy to use right away. Thus you should expect your data scientists to spend considerable time exploring your data and preparing it for analysis. This process of data cleansing and data preparation involves spotting missing values, duplicate records, and outliers, generating derived values, and normalization. (For more about these processes, see Chapters 9 and 15.)

Big data has its own challenging properties that include volume, velocity, and variety: In effect, too much of it comes in too fast, from too many places, in too many different forms. Then the main problem becomes separating the relevant data from the noise surrounding it.

In such a case, your team has to evaluate the state of the data and its type, and choose the most suitable algorithm to run on that data. Such decisions are part of an exploration phase in which the data scientists gain intimate knowledge of your data while they're selecting which attributes have the most predictive power.

Ongoing Predictive Analytics

Predictive analytics should never be about implementing one project or two, even if those two projects are very successful. It should be an ongoing process that feeds into, and is enforced by, the governing body overseeing strategy and operational planning at your organization.

You should put data at the forefront of the decision-making process at your organization. Data must support any major initiatives. After collecting and acquiring all relevant data, have your data-science team make sense of it, and propose a way forward based on their findings. The outcomes of these efforts should reach the entire organization by fostering a cultural change that embraces the analytical work as an accepted way to make informed decisions.

Your work on predictive models doesn't stop at the moment you deploy them. That only gets your foot in the door. You should actually be constantly looking for ways to improve that model. Models tend to decay over time. So refreshing the model is a necessary step in building predictive analytics solutions. The model should be undergoing continuous improvement.

Additionally, you may have several models deployed, and each one of them may have undergone several revisions. In such case, it’s imperative to have processes in place to manage the models’ lifecycle, overseeing the creation, updating, and retiring of each model. Depending on the line of the business you’re in, you may need to audit all changes and be very granular in your documentation of all steps involved in this process.

Your belief in the promise of predictive analytics should never stop you from questioning the results of a predictive analytics project. You can’t just go ahead and implement blindly. You should make sure that the results make sense businesswise. Also, when the results are too good to be true, they probably are. Verify the correctness and accuracy of all steps followed to generate those models. Scrutinizing the models’ results and asking the hard questions will only further your confidence in the decisions you will finally make based on those findings.

Sometimes the results of a predictive analytics project can be so obvious that business stakeholders may dismiss them altogether on the pretense that “we already knew that”. Keep in mind, however, that making the effort to thoroughly understand the outputs of a model can be rewarding, no matter how obvious the results may seem at first.

When (for example) a model shows you that 90 percent of your customers are urban, and are between the ages of 25 and 45, the results may seem obvious. You may feel you wasted time and resources to only find out what you already knew. It may be far more important, however, to ask what the other 10% are made of. How can you increase their percentages? You may need to build a new model to find out more about that segment of your customers. Or you may want to learn more about what attracts 90 percent of your customers to your product.

Building predictive analytics models should be an ongoing process and the results should be shared across the organization. You should always be looking to improve your models; never shy away from both experimenting and asking the hard questions. With relevant data, a talented data-science team, and the buy-in from the business stakeholders, the possibilities are endless.

Forming Your Predictive Analytics Team

A successful predictive analytics team blends the necessary skills and attitude. We'll bet you can find them in your organization.

Hiring experienced practitioners

The data-science team should be composed of experienced practitioners. Experienced data scientists know their way around data. They know what models work best for which business problems and data types.

It should be required for your data science team to have members with professional knowledge and proven experience in statistics, data mining, and machine learning. These three disciplines should be mandatory for any data science team; the idea is that these skills must exist within the team, not necessarily that every team member needs all three. However, hiring team members from diverse backgrounds can spice up and enrich your team. Other experiences and knowledge of other disciplines can make the overall team more rounded and can broaden its horizons.

Among the team members you hire to join your data-science team should be data scientists who have knowledge of your specific business domain. That business knowledge could come from past experience working on projects in your business domain or in fields or related to it. The more the team members know about your line of business, the easier it will be for them to work with your data and build analytical solutions.

There are many powerful tools provided by many vendors, in addition to great open source tools available to you. Your team members should have working knowledge with these tools. This will facilitate the life cycle of building analytical solutions. Also, that knowledge will facilitate collaboration across the team members and between business analysts and data scientists.

Demonstrating commitment and curiosity

Senior management should show their commitment to the analytical efforts. They should meet with the team members and follow the progress of their projects. They should allocate time to be briefed about the projects, their progress, and their final findings.

Your data science team members should believe in the mission and be committed to finding answers to the business questions they are after. Keeping the team members motivated and engaged will help allow them to thrive to deliver the best solutions. Team members should be curious and excited to achieve the business goals.

Your team members should be able to communicate their findings in a language understood by your business stakeholders. When the team members are able to communicate with the business stakeholders, they will be able to gather support for the new solutions and get the necessary buy-in. This is especially important when the business users will need to change the way they have been doing their work when they start applying the new findings.

The team members should be curious, always asking questions, and trying to learn as much as they can about their projects. By not shying away from asking the toughest questions about the data, methods used and models outputs, and not shying away from trying even the wackiest scenarios, the team members will deliver optimal solutions.

Collaboration among team members and across the rest of the organization is important to the success of these projects. Team members should be able to help each other and answer each other’s questions. Also they should be able to share the results and get immediate feedback.

Surveying the Marketplace

Big data and predictive analytics are bringing equally big changes to academia, the job market, and virtually every competitive company out there. Everybody will feel the impact. The survivors will treat it as an opportunity.

Responding to big data

Numerous universities offer certificates and master's degrees in predictive analytics or big-data analytics; some of these degree programs have emerged within the past year or two. This reflects the amazing growth and popularity of this field. The occupation of “data scientist” is now being labeled as one of the sexiest jobs in America by popular job journals and websites.

This demand in job growth is expected to grow; the projection is that job positions will outnumber qualified applicants. Some universities are shifting their program offerings to take advantage of this growth and attract more students. Some offer analytics programs in their business schools; while others provide similar offerings in their science and engineering schools. Like the real-world applications that handle big data and predictive analytics, the discipline that makes use of them spans departments — you can find relevant course offerings in business, mathematics, statistics, and computer science. The result is the same: more attractive and relevant degree programs for today's economy, and more students looking for a growing occupational field.

Working with big data

We read stories every day about how a hot new company is springing up using predictive analytics to solve specific problems — from predicting what you will do at every turn throughout the day to scoring how suitable you are as a boyfriend. Pretty wild. No matter how outrageous the concept, someone seems to be doing it. People and companies do it for a straightforward reason: There is a market for it. There is a huge demand for social analytics, people analytics, everything data analytics.

Statisticians and mathematicians — whose primary task once consisted primarily of sitting at desks and crunching numbers for drug and finance companies — are now in the forefront of a data revolution that promises to predict nearly everything about nearly everyone — including you.

So why are we witnessing this sudden shift in analytics? After all, mathematics, statistics and their derivatives, computer science, machine learning, and data mining have been here for decades. In fact, most of the algorithms in use today to develop predictive models were created decades ago. The answer has to be “data” — lots of it.

We gather and generate huge amounts of data every day. Only recently have we been able to mine this data effectively. Processing power and data storage have increased exponentially while getting faster and cheaper. We've figured out how to use computer hardware to store and process large amounts of data.

The field that comprises computers, software development, programming, and making profitable use of the Internet has opened up an environment where everyone can be creatively involved. Most people on earth are now connected via the World Wide Web, social networking, smartphones, tablets, apps, you name it. We spend countless hours on the Internet daily — and generate data every minute while we're at it. With that much online data, it was only natural that companies would start seeing it as a resource to be mined and refined, seeking patterns in our online behavior and exploiting what they find in hopes to capitalizing on this new opportunity. Amazon (see the accompanying sidebar) is a famous example.

In short, this is only the beginning.

AMAZING GROWTH

Throughout this book, we highlight several case studies that illustrate the successful use of predictive analytics. In this section, we'd like to highlight the crème de la crème of predictive analytics: Amazon.

As one of the largest online stores, Amazon is probably one of the best-known businesses associated with predictive analytics. Amazon analyzes endless streams of customer transactions in the quest to discover hidden purchasing patterns, as well as associations among products, customers, and purchases. When you want to see an effective recommender system in action, you'll find it working away on Amazon. Predictive analytics enabled Amazon to recommend products that are the exact product you always wanted, even that elusive “holy grail,” the product you didn't realize you wanted. This is the power of analytics and predictive modeling seeing patterns in enormous amounts of data.

To create its recommender system, Amazon uses collaborative filtering — an algorithm that looks at information on its users and on its products. By looking at the items currently in a user's shopping cart, as well as at items they've purchased, rated, and liked in the past — and then linking them to what other customers have purchased — Amazon cross-sells customers with those one-line recommendations we're all familiar with, such as

Frequently Bought TogetherCustomers Who Bought This Item Also Bought

Amazon goes even farther in its use of data: Besides generating more money by cross-selling and making marketing recommendations to its customers, Amazon uses the data to build a relationship with its customers — customized results, customized web pages, and personalized customer service. Data fuels every level of the company's interaction with its customers. And customers respond positively to it; Amazon revenues continue to soar every quarter.

Chapter 2

Predictive Analytics in the Wild

IN THIS CHAPTER

Identifying some common use cases

Implementing recommender systems

Improving targeted marketing

Optimizing customer experience by personalization

Predictive analytics sounds like a fancy name, but we use much the same process naturally in our daily decision-making. Sometimes it happens so fast that most of us don't even recognize when we’re doing it. We call that process “intuition” or “gut instinct”: In essence, it’s quickly analyzing a situation to predict an outcome — and then making a decision.

When a new problem calls for decision-making, natural gut instinct works most like predictive analytics when you’ve already had some experience in solving a similar problem. Everyone relies on individual experience, and so solves the problem or handles the situation with different degrees of success.

You’d expect the person with the most experience to make the best decisions, on average, over the long run. In fact, that is the most likely outcome for simple problems with relatively few influencing factors. For more complex problems, complex external factors influence the final result.

A hypothetical example is getting to work on time on Friday morning: You wake up in the morning 15 minutes later than you normally do. You predict — using data gathered from experience — that traffic is lighter on Friday morning than during the rest of the week. You know some general factors that influence traffic congestion:

How many commuters are going to work at the same time

Whether popular events (such as baseball games) are scheduled in the area you’re driving through

Emerging events like car accidents and bad weather

Of course, you may have considered the unusual events (outliers) but disregarded them as part of your normal decision-making. Over the long run, you’ll make a better decision about local traffic conditions than a person who just moved to the area. The net effect of that better decision mounts up: Congratulations — you’ve gained an extra hour of sleep every month.

But such competitive advantages don’t last forever. As other commuters realize this pattern, they’ll begin to take advantage of it as well — and also sleep in for an extra 15 minutes. Your returns from analyzing the Friday traffic eventually start to diminish if you don't continually optimize your get-to-work-on-Fridays model.

A model built with predictive analytics could handle far more than the few variables (influencing factors) that a human can process. A predictive model built with decision trees can find patterns with as many independent variables as can access, and may lead to a discovery that a certain variable is more influential than you initially thought. If you're a robot and can follow the rules of the decision tree, you can probably shave more time from the commute.

More complex problems lead, of course, to more complex analysis. Many factors contribute to the final decision, besides (and beyond) what the specific, immediate problem is asking for. A good example is predicting whether a stock will go up or down. At the core of the problem is a simple question: Will the stock go up or down? A simple answer is hard to get because the stock market is so fluid and dynamic. The influencers that affect a particular stock price are potentially unlimited in number.

Some influencers are logical; some are illogical. Some can't be predicted with any accuracy. Regardless, Nasim Taleb operates a hedge fund that bets on black swans — events that are very unlikely to happen, but when they do happen, the rewards can be tremendous. In his book Black Swan, he says that he only has to be right once in a decade. For the most of us, that investment strategy probably wouldn’t work; the amount of capital required to start would have to be substantially more than most of us make — because it would diminish while waiting for the major event to happen.

After the market closes, news reporters and analysts will try to explain the move with one reason or another. Was it a macro event (say, the whole stock market going up or down) or a smaller, company-specific event (say, the company released some bad news or someone tweeted negatively about its products)? Either way, be careful not to read too much into such factors; they can also be used to explain when the exact opposite result happened. Building an accurate model to predict a stock movement is still very challenging.

Predicting the correct direction of a stock with consistency has a rigid outcome: Either you make money or lose money. But the market isn't rigid: What holds true one day may not hold true the very next day. Fortunately, most such predictive modeling tasks aren't quite as complicated as predicting a stock's move upward or downward on a given trading day. Predictive analytics are more commonly used to find insights into nearly everything from marketing to law enforcement:

People’s buying patterns

Pricing of goods and services

Large-scale future events such as weather patterns

Unusual and suspicious activities

These are just a few (highly publicized) examples of predictive analytics. The potential applications are endless.

Online Marketing and Retail

Companies that have successfully used predictive analytics to improve their sales and marketing include Target Corporation, Amazon, and Netflix. Recent reports by Gartner, IBM, Sloan, and Accenture all suggest that many executives use data and predictive analytics to drive sales.

Recommender systems

You’ve probably already encountered one of the major outgrowths of predictive analytics: recommender systems. These systems try to predict your interests (for example, what you want to buy or watch) and give you recommendations. They do this by matching your preferences with items or other like-minded people, using statistics and machine learning algorithms.

If you're an online cruiser, you often see prompts like these on web pages:

People You May Know …

People Who Viewed This Item Also Viewed …

People Who Viewed This Item Bought …

Recommended Based on Your Browsing History …

Customers Who Bought This Item Also Bought …

These are examples of recommendation systems that were made mainstream by companies like Amazon, Netflix, and LinkedIn.

Obviously, these systems weren't created only for the user’s convenience — although that reason is definitely one part of the picture. No, recommender systems were created to maximize company profits. They attempt to personalize shopping on the Internet, with an algorithm serving as the salesperson. They were designed to sell, up-sell, cross-sell, keep you engaged, and keep you coming back. The goal is to turn each personalized shopper into a repeat customer. (The sidebar “The personal touch” explores one of the successful techniques.)

THE PERSONAL TOUCH

One of the authors used to work for a speech-recognition company that made order-handling systems for the top Wall Street firms. Every day the company would have to analyze a huge number of trade messages for accuracy and speed. The company came up with a system that was extremely accurate and fast. Using millions of trade messages, they constantly trained and fine-tuned the speech engine to adapt to each user’s unique speech profile. The key concept was the use of text analytics and machine learning to predict what the user (in this case, a trader) was going to do (trade) based on what the user was saying:

How the grammar was formedQuantifiable attributes such as the size of the tradeWhether the trader was buying or selling

The predictive model, created with an ensemble of machine-learning algorithms, would spot patterns in the user’s orders — and assign weights to each word that could potentially come next. Then, after the speech engine parsed each word, the system would start predicting which word would come next. The model worked much like an auto-complete feature, using a recommender system.

The company also made noise-cancelling microphones and headsets to compensate for high-noise environments such as trade shows where the products were demonstrated. We would consistently be a convention favorite; our booths would be packed with attendees waiting to participate in our demos. We started selling the products directly at the booth, and we’d have lines of buyers throughout the day.

We had a lot of fun interacting with customers instead of the normal daily routine in front of the computer, programming or analyzing data. We cross-sold accessories and up-sold more expensive microphones and headsets. But the demos and direct selling at the trade shows taught us important lessons: We were so successful not only because we gave great product demos, but also because we were recommending products of ours that would best suit the customers’ needs — based on the information they gave us. We weren't only presenters but also salespeople; we were the “live-action” recommender system.

Personalized shopping on the Internet

A software recommender system is like an online salesperson who tries to replicate the personal process we experienced at the trade shows. What’s different about a recommender system is that it’s data-driven. It makes recommendations in volume, with some subtlety (even stealth), with a dash of unconventional wisdom and without a feeling of bias. When a customer buys a product — or shows interest in a product (say, by viewing it), the system recommends a product or service that it considers highly relevant to that customer — automatically. The goal is to generate more sales — sales that wouldn’t happen if the recommendation(s) weren’t given.

Amazon is a very successful example of implementing a recommender system; their success story highlights its importance. When you browse for an item on the Amazon website, you always find some variation on the theme of related items — “Customers who viewed this also viewed” or “Customers who bought items in your recent history also bought.”

This highly effective technique is considered one of Amazon’s “killer” features — and a big reason for their huge success as the dominant online marketplace. Amazon brilliantly adapted a successful offline technique practiced by salespeople — and perfected it for the online world.

Amazon popularized recommender systems for e-commerce. Their successful example has made recommender systems so popular and important in e-commerce that other companies are following suit.

Implementing a Recommender System

There are three main approaches to creating a recommender system: collaborative filtering, content-based filtering, and a combination of both called the hybrid approach. The collaborative filtering approach uses the collective actions of the user to achieve the goal of predicting the user’s future behavior. The content-based approach attempts to match a particular user’s preferences to an item without regard to other users’ opinions. There are challenges to both the collaborative and content-based filtering approaches, which the hybrid approach attempts to solve.

Collaborative filtering

Collaborative filtering focuses on user and item characteristics based on the actions of the community. It can group users with similar interests or tastes, using classification algorithms such as k-nearest neighbor — k-NN for short (see Chapter 6 for more on k-NN). It can compute the similarity between items or users, using similarity measures such as cosine similarity (discussed in the next section).

The general concept is to find groups of people who like the same things: If person A likes X, then person B will also like X. For example: If Tiffany likes watching Frozen, then her neighbor (person with similar taste) Victoria will also like watching Frozen.

Collaborative filtering algorithms generally require

A community of users to generate data

Creating a database of interests for items by users

Formulas that can compute the similarity between items or users

Algorithms that can match users with similar interests

Collaborative filtering uses two approaches: item-based and user-based.

Item-based collaborative filtering

One of Amazon’s recommender systems uses item-based collaborative filtering