22,99 €
Use Big Data and technology to uncover real-world insights You don't need a time machine to predict the future. All it takes is a little knowledge and know-how, and Predictive Analytics For Dummies gets you there fast. With the help of this friendly guide, you'll discover the core of predictive analytics and get started putting it to use with readily available tools to collect and analyze data. In no time, you'll learn how to incorporate algorithms through data models, identify similarities and relationships in your data, and predict the future through data classification. Along the way, you'll develop a roadmap by preparing your data, creating goals, processing your data, and building a predictive model that will get you stakeholder buy-in. Big Data has taken the marketplace by storm, and companies are seeking qualified talent to quickly fill positions to analyze the massive amount of data that are being collected each day. If you want to get in on the action and either learn or deepen your understanding of how to use predictive analytics to find real relationships between what you know and what you want to know, everything you need is a page away! * Offers common use cases to help you get started * Covers details on modeling, k-means clustering, and more * Includes information on structuring your data * Provides tips on outlining business goals and approaches The future starts today with the help of Predictive Analytics For Dummies.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 665
Veröffentlichungsjahr: 2016
Predictive Analytics For Dummies®, 2nd Edition
Published by: John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030-5774, www.wiley.com
Copyright © 2017 by John Wiley & Sons, Inc., Hoboken, New Jersey
Media and software compilation copyright © 2017 by John Wiley & Sons, Inc. All rights reserved.
Published simultaneously in Canada
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without the prior written permission of the Publisher. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.
Trademarks: Wiley, For Dummies, the Dummies Man logo, Dummies.com, Making Everything Easier, and related trade dress are trademarks or registered trademarks of John Wiley & Sons, Inc. and may not be used without written permission. All trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book.
LIMIT OF LIABILITY/DISCLAIMER OF WARRANTY: THE PUBLISHER AND THE AUTHOR MAKE NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE ACCURACY OR COMPLETENESS OF THE CONTENTS OF THIS WORK AND SPECIFICALLY DISCLAIM ALL WARRANTIES, INCLUDING WITHOUT LIMITATION WARRANTIES OF FITNESS FOR A PARTICULAR PURPOSE. NO WARRANTY MAY BE CREATED OR EXTENDED BY SALES OR PROMOTIONAL MATERIALS. THE ADVICE AND STRATEGIES CONTAINED HEREIN MAY NOT BE SUITABLE FOR EVERY SITUATION. THIS WORK IS SOLD WITH THE UNDERSTANDING THAT THE PUBLISHER IS NOT ENGAGED IN RENDERING LEGAL, ACCOUNTING, OR OTHER PROFESSIONAL SERVICES. IF PROFESSIONAL ASSISTANCE IS REQUIRED, THE SERVICES OF A COMPETENT PROFESSIONAL PERSON SHOULD BE SOUGHT. NEITHER THE PUBLISHER NOR THE AUTHOR SHALL BE LIABLE FOR DAMAGES ARISING HEREFROM. THE FACT THAT AN ORGANIZATION OR WEBSITE IS REFERRED TO IN THIS WORK AS A CITATION AND/OR A POTENTIAL SOURCE OF FURTHER INFORMATION DOES NOT MEAN THAT THE AUTHOR OR THE PUBLISHER ENDORSES THE INFORMATION THE ORGANIZATION OR WEBSITE MAY PROVIDE OR RECOMMENDATIONS IT MAY MAKE. FURTHER, READERS SHOULD BE AWARE THAT INTERNET WEBSITES LISTED IN THIS WORK MAY HAVE CHANGED OR DISAPPEARED BETWEEN WHEN THIS WORK WAS WRITTEN AND WHEN IT IS READ.
For general information on our other products and services, please contact our Customer Care Department within the U.S. at 877-762-2974, outside the U.S. at 317-572-3993, or fax 317-572-4002. For technical support, please visit https://hub.wiley.com/community/support/dummies.
Wiley publishes in a variety of print and electronic formats and by print-on-demand. Some material included with standard print versions of this book may not be included in e-books or in print-on-demand. If this book refers to media such as a CD or DVD that is not included in the version you purchased, you may download this material at http://booksupport.wiley.com. For more information about Wiley products, visit www.wiley.com.
Library of Congress Control Number: 2016951998
ISBN 978-1-119-26700-3 (pbk); 978-1-119-26701-0 (epub); 978-1-119-26702-7 (epdf)
Table of Contents
Cover
Introduction
About This Book
Foolish Assumptions
Icons Used in This Book
Beyond the Book
Where to Go from Here
Part 1: Getting Started with Predictive Analytics
Chapter 1: Entering the Arena
Exploring Predictive Analytics
Adding Business Value
Starting a Predictive Analytic Project
Ongoing Predictive Analytics
Forming Your Predictive Analytics Team
Surveying the Marketplace
Chapter 2: Predictive Analytics in the Wild
Online Marketing and Retail
Implementing a Recommender System
Target Marketing
Personalization
Content and Text Analytics
Chapter 3: Exploring Your Data Types and Associated Techniques
Recognizing Your Data Types
Identifying Data Categories
Generating Predictive Analytics
Connecting to Related Disciplines
Chapter 4: Complexities of Data
Finding Value in Your Data
Constantly Changing Data
Complexities in Searching Your Data
Differentiating Business Intelligence from Big-Data Analytics
Exploration of Raw Data
Part 2: Incorporating Algorithms in Your Models
Chapter 5: Applying Models
Modeling Data
Healthcare Analytics Case Studies
Social and Marketing Analytics Case Studies
Prognostics and its Relation to Predictive Analytics
The Rise of Open Data
Chapter 6: Identifying Similarities in Data
Explaining Data Clustering
Converting Raw Data into a Matrix
Identifying Groups in Your Data
Finding Associations in Data Items
Applying Biologically Inspired Clustering Techniques
Chapter 7: Predicting the Future Using Data Classification
Explaining Data Classification
Introducing Data Classification to Your Business
Exploring the Data-Classification Process
Using Data Classification to Predict the Future
Ensemble Methods to Boost Prediction Accuracy
Deep Learning
Part 3: Developing a Roadmap
Chapter 8: Convincing Your Management to Adopt Predictive Analytics
Making the Business Case
Gathering Support from Stakeholders
Presenting Your Proposal
Chapter 9: Preparing Data
Listing the Business Objectives
Processing Your Data
Working with Features
Structuring Your Data
Chapter 10: Building a Predictive Model
Getting Started
Developing and Testing the Model
Going Live with the Model
Chapter 11: Visualization of Analytical Results
Visualization as a Predictive Tool
Evaluating Your Visualization
Visualizing Your Model’s Analytical Results
Novel Visualization in Predictive Analytics
Big Data Visualization Tools
Part 4: Programming Predictive Analytics
Chapter 12: Creating Basic Prediction Examples
Installing the Software Packages
Preparing the Data
Making Predictions Using Classification Algorithms
Chapter 13: Creating Basic Examples of Unsupervised Predictions
Getting the Sample Dataset
Using Clustering Algorithms to Make Predictions
Chapter 14: Predictive Modeling with R
Programming in R
Making Predictions Using R
Chapter 15: Avoiding Analysis Traps
Data Challenges
Analysis Challenges
Part 5: Executing Big Data
Chapter 16: Targeting Big Data
Major Technological Trends in Predictive Analytics
Applying Open-Source Tools to Big Data
Chapter 17: Getting Ready for Enterprise Analytics
Analytics as a Service
Preparing for a Proof-of-Value of Predictive Analytics Prototype
Part 6: The Part of Tens
Chapter 18: Ten Reasons to Implement Predictive Analytics
Identifying Business Goals
Knowing Your Data
Organizing Your Data
Satisfying Your Customers
Reducing Operational Costs
Increasing Returns on Investments (ROI)
Gaining Rapid Access to Information
Making Informed Decisions
Gaining Competitive Edge
Improving the Business
Chapter 19: Ten Steps to Build a Predictive Analytic Model
Building a Predictive Analytics Team
Setting the Business Objectives
Preparing Your Data
Sampling Your Data
Avoiding “Garbage In, Garbage Out”
Creating Quick Victories
Fostering Change in Your Organization
Building Deployable Models
Evaluating Your Model
Updating Your Model
About the Authors
Connect with Dummies
End User License Agreement
Cover
Table of Contents
Begin Reading
i
ii
iii
iv
v
vi
vii
viii
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
445
446
447
448
Predictive Analytics is the art and science of using data to make better informed decisions. Predictive analytics helps you uncover hidden patterns and relationships in your data that can help you predict with greater confidence what may happen in the future, and provide you with valuable, actionable insights for your organization.
Our goal was to make this complex subject as practical as possible, in a way that appeals to everyone from technical experts to non-technical level business strategists.
The subject is complex because it is not really just one subject. It is the combination of at least a few multifaceted fields: data mining, statistics, and mathematics.
Data mining requires an understanding of machine learning and information retrieval. On top of this, mathematics and statistics must be applied to your business domain; be it marketing, actuary service, fraud, crime, or banking.
Most of the current materials on predictive analytics are pretty difficult to read if you don't already have a background in some of the aforementioned subjects. They are filled with complex mathematical equations and modeling techniques. Or, they are at a high level with specific use cases but with little guidance regarding implementation. We include both, while trying to keep a wide spectrum of readers engaged.
The focus of this book is developing a roadmap for implementing predictive analytics within your organization. Its intended audience is the larger community of business managers, business analysts, data scientists, and information technology professionals.
Maybe you are a business manager and you have heard the buzz about predictive analytics. Maybe you've been working with data mining and you want to add predictive analytics to your skill set. Maybe you know R or Python, but you're totally new to predictive analytics. If this sounds like you, then this book will be a good fit. Even if you have no experience analyzing data, but want or need to derive greater value from your organization’s data, you can also find something of value in this book.
Without oversimplifying, we have tried to explain technical concepts in non-technical terms, tackling each topic from the ground up.
Even if you are an experienced practitioner, you should find something new, and at the very least, you will gain validation for what you already know, and guidance for establishing best practices.
We also hope to have contributed a few concepts and ideas for the very first time in a major publication like this. For example we explain how you can apply biologically inspired algorithms to predictive analytics.
We assume that the reader will not be a programmer. The code presented in this book is very brief and easy to follow. Readers of all programming levels will benefit from this book, because it is more about learning the process of predictive analytics rather than learning a programming language.
The following icons in the margins indicate highlighted material that we think could be of interest to you. Next, we describe the meaning of each icon that is used in this book.
The tips are ideas we would like you to take note of. This is usually practical advice you can apply for that given topic.
This icon is rarely used in this book. We may have used it only once or twice in the entire book. The intent is to save you time by bringing to your attention some common pitfalls that you are better off avoiding.
We have made sincere efforts to steer away from the technical stuff. But when we have no choice we make sure to let you know. So if you don’t care too much about the technical stuff you can easily skip this part and you won’t miss much. If the technical stuff is your thing, then you may find these sections fascinating.
This is something we would like you to take a special note of. This is a concept or idea we think is important for you know and remember. An example of this would be a best practice we think it is noteworthy.
A lot of extra content that is not in this book is available at www.dummies.com. Go online to find the following:
The Cheat Sheet for this book is at
www.dummies.com/cheatsheet/predictiveanalytics
Here you’ll find the necessary steps needed to build a predictive analytics model and some cases studies of predictive analytics.
Updates to this book, if we have any, are also available at
www.dummies.com/extras/predictiveanalytics
Let’s start making some predictions! You can apply predictive analytics to virtually every business domain. Right now there is explosive growth in predictive analytics’ market, and this is just the beginning. The arena is wide open, and the possibilities are endless.
Part 1
IN THIS PART …
Exploring predictive analytics
Identifying uses
Classifying data
Presenting information
Chapter 1
IN THIS CHAPTER
Explaining the building blocks
Probing capabilities
Surveying the market
Predictive analytics is a bright light bulb powered by your data.
You can never have too much insight. The more you see, the better the decisions you make — and you never want to be in the dark. You want to see what lies ahead, preferably before others do. It's like playing the game “Let's Make a Deal” where you have to choose the door with the hidden prize. Which door do you choose? Door 1, Door 2, or Door 3? They all look the same, so it's just your best guess — your choice depends on you and your luck. But what if you had an edge — the ability to see through the keyhole? Predictive analytics can give you that edge.
For the moment, let's forget about algorithms and higher math; predictions are used in every aspect of our lives. Consider how many times you have said (or heard people say), “I told you that was going to happen.”
When you want to predict a future event with any accuracy, however, you'll need to know the past and understand the current situation. Doing so entails several processes:
Extract the facts that are currently happening.
Distinguish present facts from those that just happened.
Derive possible scenarios that could happen.
Rank the scenarios according to how likely they are to happen.
Predictive analytics can help you with each of these processes, so that you know as much as you can about what has happened and can make better-informed decisions about the future.
Companies typically create predictive analytics solutions by combining three ingredients:
Business knowledge
Data-science team and technology
The data
Though the proportion of the three ingredients will vary from one business to the next, all are required for a successful predictive analytic solution that yields actionable insights.
Because any predictive analytics project is started to fulfill a business need, business-specific knowledge and a clear business objective are critical to its success. Ideas for a project can come from anyone within the organization, but it's up to the leadership team to set the business goals and get buy-in from the needed departments across the whole organization.
Be sure the decision-makers in your team are prepared to act. When you present a prototype of your project, it needs an in-house champion — someone who's going to push for its adoption.
The leadership team or domain experts must also set clear metrics — ways to quantify and measure the outcome of the project. Appropriate metrics keep the departments involved clear about what they need to do, how much they need to do, and whether what they're doing is helping the company achieve its business goals.
The business stakeholders are those who are most familiar with the domain of the business. They'll have ideas about which correlations — relationships between features — of data work and which don't, which variables are important to the model, and whether you should create new variables — as in derived features or attributes — to improve the model.
Business analysts and other domain experts can analyze and interpret the patterns discovered by the machines, making useful meaning out of the data patterns and deriving actionable insights.
This is an iterative (building a model and interpreting its findings) process between business and science. In the course of building a predictive model, you have to try successive versions of the model to improve how it works (which is what data experts mean when they say iterate the model over its lifecycle). You might go through a lot of revisions and repetitions before you can prove that your model is bringing real value to the business. Even after the predictive models are deployed, the business must monitor the results, validate the accuracy of the models and improve upon the models as more data is being collected.
The technology used in predictive analytics will include at least some (if not all) of these capabilities:
Data mining
Statistics
Machine-learning algorithms
Software tools to build the model
The business people needn't understand the details of all the technology used or the math involved — but they should have a good handle on the process that model represents, and on how it integrates with the overall infrastructure of your organization. Remember, this is a collaborative process; the data scientists and business people must work closely together to build the model.
By the same token, providing a good general grasp of business knowledge to the data scientists gives them a better chance at creating an accurate predictive model, and helps them deploy the model much more quickly. After the model is deployed, the business can start evaluating the results right away — and the teams can start working on improving the model. Through testing, the teams will learn together what works and what doesn't.
The combination of business knowledge, data exploration, and technology leads to a successful deployment of the predictive model. So the overall approach is to develop the model through successive versions and make sure the team members have enough knowledge of both the business and the data science that everyone is on the same page.
Some analytical tools — specialized software products — are advanced enough that they require people with scientific backgrounds to use them; others are simple enough that any business person within the organization can use them. Selecting the right tool(s) is also a decision that must be taken very carefully. Every company will have different needs and not any one tool can address all those needs. But one thing is certain; every company will have to use some sort of tool to do predictive analytics.
Selecting the right software product for the job depends on such factors as
The cost of the product
The complexity of the business problem
The complexity of data
The source(s) of the data
The velocity of the data (the speed by which the data changes)
The people within the organization who will use the product
All else being equal, you'd expect a person who has more experience to be better at doing a job, playing a game, or whatever than someone who has less experience. That same thinking can be applied to an organization. If you imagine an organization as a person, you can view the organization's data as its equivalent of experience. By using that experience, you can make more insightful business decisions and operate with greater efficiency. Such is the process of turning data into business value with predictive analytics.
It's increasingly clear that data is a vital asset for driving the decision-making process quick, realistic answers and insights. Predictive analytics empower business decisions by uncovering opportunities such as emerging trends, markets, or customers before the competition.
Data can also present a few challenges in its raw form. It can be distributed across multiple sources, mix your own data with third-party data, and otherwise make the quality of incoming data too messy to use right away. Thus you should expect your data scientists to spend considerable time exploring your data and preparing it for analysis. This process of data cleansing and data preparation involves spotting missing values, duplicate records, and outliers, generating derived values, and normalization. (For more about these processes, see Chapters 9 and 15.)
Big data has its own challenging properties that include volume, velocity, and variety: In effect, too much of it comes in too fast, from too many places, in too many different forms. Then the main problem becomes separating the relevant data from the noise surrounding it.
In such a case, your team has to evaluate the state of the data and its type, and choose the most suitable algorithm to run on that data. Such decisions are part of an exploration phase in which the data scientists gain intimate knowledge of your data while they're selecting which attributes have the most predictive power.
Predictive analytics should never be about implementing one project or two, even if those two projects are very successful. It should be an ongoing process that feeds into, and is enforced by, the governing body overseeing strategy and operational planning at your organization.
You should put data at the forefront of the decision-making process at your organization. Data must support any major initiatives. After collecting and acquiring all relevant data, have your data-science team make sense of it, and propose a way forward based on their findings. The outcomes of these efforts should reach the entire organization by fostering a cultural change that embraces the analytical work as an accepted way to make informed decisions.
Your work on predictive models doesn't stop at the moment you deploy them. That only gets your foot in the door. You should actually be constantly looking for ways to improve that model. Models tend to decay over time. So refreshing the model is a necessary step in building predictive analytics solutions. The model should be undergoing continuous improvement.
Additionally, you may have several models deployed, and each one of them may have undergone several revisions. In such case, it’s imperative to have processes in place to manage the models’ lifecycle, overseeing the creation, updating, and retiring of each model. Depending on the line of the business you’re in, you may need to audit all changes and be very granular in your documentation of all steps involved in this process.
Your belief in the promise of predictive analytics should never stop you from questioning the results of a predictive analytics project. You can’t just go ahead and implement blindly. You should make sure that the results make sense businesswise. Also, when the results are too good to be true, they probably are. Verify the correctness and accuracy of all steps followed to generate those models. Scrutinizing the models’ results and asking the hard questions will only further your confidence in the decisions you will finally make based on those findings.
Sometimes the results of a predictive analytics project can be so obvious that business stakeholders may dismiss them altogether on the pretense that “we already knew that”. Keep in mind, however, that making the effort to thoroughly understand the outputs of a model can be rewarding, no matter how obvious the results may seem at first.
When (for example) a model shows you that 90 percent of your customers are urban, and are between the ages of 25 and 45, the results may seem obvious. You may feel you wasted time and resources to only find out what you already knew. It may be far more important, however, to ask what the other 10% are made of. How can you increase their percentages? You may need to build a new model to find out more about that segment of your customers. Or you may want to learn more about what attracts 90 percent of your customers to your product.
Building predictive analytics models should be an ongoing process and the results should be shared across the organization. You should always be looking to improve your models; never shy away from both experimenting and asking the hard questions. With relevant data, a talented data-science team, and the buy-in from the business stakeholders, the possibilities are endless.
A successful predictive analytics team blends the necessary skills and attitude. We'll bet you can find them in your organization.
The data-science team should be composed of experienced practitioners. Experienced data scientists know their way around data. They know what models work best for which business problems and data types.
It should be required for your data science team to have members with professional knowledge and proven experience in statistics, data mining, and machine learning. These three disciplines should be mandatory for any data science team; the idea is that these skills must exist within the team, not necessarily that every team member needs all three. However, hiring team members from diverse backgrounds can spice up and enrich your team. Other experiences and knowledge of other disciplines can make the overall team more rounded and can broaden its horizons.
Among the team members you hire to join your data-science team should be data scientists who have knowledge of your specific business domain. That business knowledge could come from past experience working on projects in your business domain or in fields or related to it. The more the team members know about your line of business, the easier it will be for them to work with your data and build analytical solutions.
There are many powerful tools provided by many vendors, in addition to great open source tools available to you. Your team members should have working knowledge with these tools. This will facilitate the life cycle of building analytical solutions. Also, that knowledge will facilitate collaboration across the team members and between business analysts and data scientists.
Senior management should show their commitment to the analytical efforts. They should meet with the team members and follow the progress of their projects. They should allocate time to be briefed about the projects, their progress, and their final findings.
Your data science team members should believe in the mission and be committed to finding answers to the business questions they are after. Keeping the team members motivated and engaged will help allow them to thrive to deliver the best solutions. Team members should be curious and excited to achieve the business goals.
Your team members should be able to communicate their findings in a language understood by your business stakeholders. When the team members are able to communicate with the business stakeholders, they will be able to gather support for the new solutions and get the necessary buy-in. This is especially important when the business users will need to change the way they have been doing their work when they start applying the new findings.
The team members should be curious, always asking questions, and trying to learn as much as they can about their projects. By not shying away from asking the toughest questions about the data, methods used and models outputs, and not shying away from trying even the wackiest scenarios, the team members will deliver optimal solutions.
Collaboration among team members and across the rest of the organization is important to the success of these projects. Team members should be able to help each other and answer each other’s questions. Also they should be able to share the results and get immediate feedback.
Big data and predictive analytics are bringing equally big changes to academia, the job market, and virtually every competitive company out there. Everybody will feel the impact. The survivors will treat it as an opportunity.
Numerous universities offer certificates and master's degrees in predictive analytics or big-data analytics; some of these degree programs have emerged within the past year or two. This reflects the amazing growth and popularity of this field. The occupation of “data scientist” is now being labeled as one of the sexiest jobs in America by popular job journals and websites.
This demand in job growth is expected to grow; the projection is that job positions will outnumber qualified applicants. Some universities are shifting their program offerings to take advantage of this growth and attract more students. Some offer analytics programs in their business schools; while others provide similar offerings in their science and engineering schools. Like the real-world applications that handle big data and predictive analytics, the discipline that makes use of them spans departments — you can find relevant course offerings in business, mathematics, statistics, and computer science. The result is the same: more attractive and relevant degree programs for today's economy, and more students looking for a growing occupational field.
We read stories every day about how a hot new company is springing up using predictive analytics to solve specific problems — from predicting what you will do at every turn throughout the day to scoring how suitable you are as a boyfriend. Pretty wild. No matter how outrageous the concept, someone seems to be doing it. People and companies do it for a straightforward reason: There is a market for it. There is a huge demand for social analytics, people analytics, everything data analytics.
Statisticians and mathematicians — whose primary task once consisted primarily of sitting at desks and crunching numbers for drug and finance companies — are now in the forefront of a data revolution that promises to predict nearly everything about nearly everyone — including you.
So why are we witnessing this sudden shift in analytics? After all, mathematics, statistics and their derivatives, computer science, machine learning, and data mining have been here for decades. In fact, most of the algorithms in use today to develop predictive models were created decades ago. The answer has to be “data” — lots of it.
We gather and generate huge amounts of data every day. Only recently have we been able to mine this data effectively. Processing power and data storage have increased exponentially while getting faster and cheaper. We've figured out how to use computer hardware to store and process large amounts of data.
The field that comprises computers, software development, programming, and making profitable use of the Internet has opened up an environment where everyone can be creatively involved. Most people on earth are now connected via the World Wide Web, social networking, smartphones, tablets, apps, you name it. We spend countless hours on the Internet daily — and generate data every minute while we're at it. With that much online data, it was only natural that companies would start seeing it as a resource to be mined and refined, seeking patterns in our online behavior and exploiting what they find in hopes to capitalizing on this new opportunity. Amazon (see the accompanying sidebar) is a famous example.
In short, this is only the beginning.
Throughout this book, we highlight several case studies that illustrate the successful use of predictive analytics. In this section, we'd like to highlight the crème de la crème of predictive analytics: Amazon.
As one of the largest online stores, Amazon is probably one of the best-known businesses associated with predictive analytics. Amazon analyzes endless streams of customer transactions in the quest to discover hidden purchasing patterns, as well as associations among products, customers, and purchases. When you want to see an effective recommender system in action, you'll find it working away on Amazon. Predictive analytics enabled Amazon to recommend products that are the exact product you always wanted, even that elusive “holy grail,” the product you didn't realize you wanted. This is the power of analytics and predictive modeling seeing patterns in enormous amounts of data.
To create its recommender system, Amazon uses collaborative filtering — an algorithm that looks at information on its users and on its products. By looking at the items currently in a user's shopping cart, as well as at items they've purchased, rated, and liked in the past — and then linking them to what other customers have purchased — Amazon cross-sells customers with those one-line recommendations we're all familiar with, such as
Frequently Bought TogetherCustomers Who Bought This Item Also BoughtAmazon goes even farther in its use of data: Besides generating more money by cross-selling and making marketing recommendations to its customers, Amazon uses the data to build a relationship with its customers — customized results, customized web pages, and personalized customer service. Data fuels every level of the company's interaction with its customers. And customers respond positively to it; Amazon revenues continue to soar every quarter.
Chapter 2
IN THIS CHAPTER
Identifying some common use cases
Implementing recommender systems
Improving targeted marketing
Optimizing customer experience by personalization
Predictive analytics sounds like a fancy name, but we use much the same process naturally in our daily decision-making. Sometimes it happens so fast that most of us don't even recognize when we’re doing it. We call that process “intuition” or “gut instinct”: In essence, it’s quickly analyzing a situation to predict an outcome — and then making a decision.
When a new problem calls for decision-making, natural gut instinct works most like predictive analytics when you’ve already had some experience in solving a similar problem. Everyone relies on individual experience, and so solves the problem or handles the situation with different degrees of success.
You’d expect the person with the most experience to make the best decisions, on average, over the long run. In fact, that is the most likely outcome for simple problems with relatively few influencing factors. For more complex problems, complex external factors influence the final result.
A hypothetical example is getting to work on time on Friday morning: You wake up in the morning 15 minutes later than you normally do. You predict — using data gathered from experience — that traffic is lighter on Friday morning than during the rest of the week. You know some general factors that influence traffic congestion:
How many commuters are going to work at the same time
Whether popular events (such as baseball games) are scheduled in the area you’re driving through
Emerging events like car accidents and bad weather
Of course, you may have considered the unusual events (outliers) but disregarded them as part of your normal decision-making. Over the long run, you’ll make a better decision about local traffic conditions than a person who just moved to the area. The net effect of that better decision mounts up: Congratulations — you’ve gained an extra hour of sleep every month.
But such competitive advantages don’t last forever. As other commuters realize this pattern, they’ll begin to take advantage of it as well — and also sleep in for an extra 15 minutes. Your returns from analyzing the Friday traffic eventually start to diminish if you don't continually optimize your get-to-work-on-Fridays model.
A model built with predictive analytics could handle far more than the few variables (influencing factors) that a human can process. A predictive model built with decision trees can find patterns with as many independent variables as can access, and may lead to a discovery that a certain variable is more influential than you initially thought. If you're a robot and can follow the rules of the decision tree, you can probably shave more time from the commute.
More complex problems lead, of course, to more complex analysis. Many factors contribute to the final decision, besides (and beyond) what the specific, immediate problem is asking for. A good example is predicting whether a stock will go up or down. At the core of the problem is a simple question: Will the stock go up or down? A simple answer is hard to get because the stock market is so fluid and dynamic. The influencers that affect a particular stock price are potentially unlimited in number.
Some influencers are logical; some are illogical. Some can't be predicted with any accuracy. Regardless, Nasim Taleb operates a hedge fund that bets on black swans — events that are very unlikely to happen, but when they do happen, the rewards can be tremendous. In his book Black Swan, he says that he only has to be right once in a decade. For the most of us, that investment strategy probably wouldn’t work; the amount of capital required to start would have to be substantially more than most of us make — because it would diminish while waiting for the major event to happen.
After the market closes, news reporters and analysts will try to explain the move with one reason or another. Was it a macro event (say, the whole stock market going up or down) or a smaller, company-specific event (say, the company released some bad news or someone tweeted negatively about its products)? Either way, be careful not to read too much into such factors; they can also be used to explain when the exact opposite result happened. Building an accurate model to predict a stock movement is still very challenging.
Predicting the correct direction of a stock with consistency has a rigid outcome: Either you make money or lose money. But the market isn't rigid: What holds true one day may not hold true the very next day. Fortunately, most such predictive modeling tasks aren't quite as complicated as predicting a stock's move upward or downward on a given trading day. Predictive analytics are more commonly used to find insights into nearly everything from marketing to law enforcement:
People’s buying patterns
Pricing of goods and services
Large-scale future events such as weather patterns
Unusual and suspicious activities
These are just a few (highly publicized) examples of predictive analytics. The potential applications are endless.
Companies that have successfully used predictive analytics to improve their sales and marketing include Target Corporation, Amazon, and Netflix. Recent reports by Gartner, IBM, Sloan, and Accenture all suggest that many executives use data and predictive analytics to drive sales.
You’ve probably already encountered one of the major outgrowths of predictive analytics: recommender systems. These systems try to predict your interests (for example, what you want to buy or watch) and give you recommendations. They do this by matching your preferences with items or other like-minded people, using statistics and machine learning algorithms.
If you're an online cruiser, you often see prompts like these on web pages:
People You May Know …
People Who Viewed This Item Also Viewed …
People Who Viewed This Item Bought …
Recommended Based on Your Browsing History …
Customers Who Bought This Item Also Bought …
These are examples of recommendation systems that were made mainstream by companies like Amazon, Netflix, and LinkedIn.
Obviously, these systems weren't created only for the user’s convenience — although that reason is definitely one part of the picture. No, recommender systems were created to maximize company profits. They attempt to personalize shopping on the Internet, with an algorithm serving as the salesperson. They were designed to sell, up-sell, cross-sell, keep you engaged, and keep you coming back. The goal is to turn each personalized shopper into a repeat customer. (The sidebar “The personal touch” explores one of the successful techniques.)
One of the authors used to work for a speech-recognition company that made order-handling systems for the top Wall Street firms. Every day the company would have to analyze a huge number of trade messages for accuracy and speed. The company came up with a system that was extremely accurate and fast. Using millions of trade messages, they constantly trained and fine-tuned the speech engine to adapt to each user’s unique speech profile. The key concept was the use of text analytics and machine learning to predict what the user (in this case, a trader) was going to do (trade) based on what the user was saying:
How the grammar was formedQuantifiable attributes such as the size of the tradeWhether the trader was buying or sellingThe predictive model, created with an ensemble of machine-learning algorithms, would spot patterns in the user’s orders — and assign weights to each word that could potentially come next. Then, after the speech engine parsed each word, the system would start predicting which word would come next. The model worked much like an auto-complete feature, using a recommender system.
The company also made noise-cancelling microphones and headsets to compensate for high-noise environments such as trade shows where the products were demonstrated. We would consistently be a convention favorite; our booths would be packed with attendees waiting to participate in our demos. We started selling the products directly at the booth, and we’d have lines of buyers throughout the day.
We had a lot of fun interacting with customers instead of the normal daily routine in front of the computer, programming or analyzing data. We cross-sold accessories and up-sold more expensive microphones and headsets. But the demos and direct selling at the trade shows taught us important lessons: We were so successful not only because we gave great product demos, but also because we were recommending products of ours that would best suit the customers’ needs — based on the information they gave us. We weren't only presenters but also salespeople; we were the “live-action” recommender system.
A software recommender system is like an online salesperson who tries to replicate the personal process we experienced at the trade shows. What’s different about a recommender system is that it’s data-driven. It makes recommendations in volume, with some subtlety (even stealth), with a dash of unconventional wisdom and without a feeling of bias. When a customer buys a product — or shows interest in a product (say, by viewing it), the system recommends a product or service that it considers highly relevant to that customer — automatically. The goal is to generate more sales — sales that wouldn’t happen if the recommendation(s) weren’t given.
Amazon is a very successful example of implementing a recommender system; their success story highlights its importance. When you browse for an item on the Amazon website, you always find some variation on the theme of related items — “Customers who viewed this also viewed” or “Customers who bought items in your recent history also bought.”
This highly effective technique is considered one of Amazon’s “killer” features — and a big reason for their huge success as the dominant online marketplace. Amazon brilliantly adapted a successful offline technique practiced by salespeople — and perfected it for the online world.
Amazon popularized recommender systems for e-commerce. Their successful example has made recommender systems so popular and important in e-commerce that other companies are following suit.
There are three main approaches to creating a recommender system: collaborative filtering, content-based filtering, and a combination of both called the hybrid approach. The collaborative filtering approach uses the collective actions of the user to achieve the goal of predicting the user’s future behavior. The content-based approach attempts to match a particular user’s preferences to an item without regard to other users’ opinions. There are challenges to both the collaborative and content-based filtering approaches, which the hybrid approach attempts to solve.
Collaborative filtering focuses on user and item characteristics based on the actions of the community. It can group users with similar interests or tastes, using classification algorithms such as k-nearest neighbor — k-NN for short (see Chapter 6 for more on k-NN). It can compute the similarity between items or users, using similarity measures such as cosine similarity (discussed in the next section).
The general concept is to find groups of people who like the same things: If person A likes X, then person B will also like X. For example: If Tiffany likes watching Frozen, then her neighbor (person with similar taste) Victoria will also like watching Frozen.
Collaborative filtering algorithms generally require
A community of users to generate data
Creating a database of interests for items by users
Formulas that can compute the similarity between items or users
Algorithms that can match users with similar interests
Collaborative filtering uses two approaches: item-based and user-based.
One of Amazon’s recommender systems uses item-based collaborative filtering
