33,99 €
Detect fraud earlier to mitigate loss and prevent cascading damage Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques is an authoritative guidebook for setting up a comprehensive fraud detection analytics solution. Early detection is a key factor in mitigating fraud damage, but it involves more specialized techniques than detecting fraud at the more advanced stages. This invaluable guide details both the theory and technical aspects of these techniques, and provides expert insight into streamlining implementation. Coverage includes data gathering, preprocessing, model building, and post-implementation, with comprehensive guidance on various learning techniques and the data types utilized by each. These techniques are effective for fraud detection across industry boundaries, including applications in insurance fraud, credit card fraud, anti-money laundering, healthcare fraud, telecommunications fraud, click fraud, tax evasion, and more, giving you a highly practical framework for fraud prevention. It is estimated that a typical organization loses about 5% of its revenue to fraud every year. More effective fraud detection is possible, and this book describes the various analytical techniques your organization must implement to put a stop to the revenue leak. * Examine fraud patterns in historical data * Utilize labeled, unlabeled, and networked data * Detect fraud before the damage cascades * Reduce losses, increase recovery, and tighten security The longer fraud is allowed to go on, the more harm it causes. It expands exponentially, sending ripples of damage throughout the organization, and becomes more and more complex to track, stop, and reverse. Fraud prevention relies on early and effective fraud detection, enabled by the techniques discussed here. Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques helps you stop fraud in its tracks, and eliminate the opportunities for future occurrence.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 479
Veröffentlichungsjahr: 2015
Title Page
Copyright
Dedication
List of Figures
Foreword
Preface
Acknowledgments
Chapter 1: Fraud: Detection, Prevention, and Analytics!
Introduction
Fraud!
Fraud Detection and Prevention
Big Data for Fraud Detection
Data-Driven Fraud Detection
Fraud-Detection Techniques
Fraud Cycle
The Fraud Analytics Process Model
Fraud Data Scientists
A Scientific Perspective on Fraud
References
Chapter 2: Data Collection, Sampling, and Preprocessing
Introduction
Types of Data Sources
Merging Data Sources
Sampling
Types of Data Elements
Visual Data Exploration and Exploratory Statistical Analysis
Benford's Law
Descriptive Statistics
Missing Values
Outlier Detection and Treatment
Red Flags
Standardizing Data
Categorization
Weights of Evidence Coding
Variable Selection
Principal Components Analysis
RIDITs
PRIDIT Analysis
Segmentation
References
Chapter 3: Descriptive Analytics for Fraud Detection
Introduction
Graphical Outlier Detection Procedures
Statistical Outlier Detection Procedures
Clustering
One-Class SVMs
References
Chapter 4: Predictive Analytics for Fraud Detection
Introduction
Target Definition
Linear Regression
Logistic Regression
Variable Selection for Linear and Logistic Regression
Decision Trees
Neural Networks
Support Vector Machines
Ensemble Methods
Multiclass Classification Techniques
Evaluating Predictive Models
Other Performance Measures for Predictive Analytical Models
Developing Predictive Models for Skewed Data Sets
Fraud Performance Benchmarks
References
Chapter 5: Social Network Analysis for Fraud Detection
Networks: Form, Components, Characteristics, and Their Applications
Is Fraud a Social Phenomenon? An Introduction to Homophily
Impact of the Neighborhood: Metrics
Community Mining: Finding Groups of Fraudsters
Extending the Graph: Toward a Bipartite Representation
References
Chapter 6: Fraud Analytics: Post-Processing
Introduction
The Analytical Fraud Model Life Cycle
Model Representation
Selecting the Sample to Investigate
Fraud Alert and Case Management
Visual Analytics
Backtesting Analytical Fraud Models
Model Design and Documentation
References
Chapter 7: Fraud Analytics: A Broader Perspective
Introduction
Data Quality
Privacy
Capital Calculation for Fraud Loss
An Economic Perspective on Fraud Analytics
In Versus Outsourcing
Modeling Extensions
The Internet of Things
Corporate Fraud Governance
References
About the Authors
Index
End User License Agreement
i
ii
iii
v
vi
xv
xvi
xvii
xviii
xix
xx
xxi
xxii
xxiii
xxv
xxvi
xxvii
xxix
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
49
48
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
81
80
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
256
257
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
Cover
Table of Contents
Foreword
Preface
Begin Reading
Chapter 1: Fraud: Detection, Prevention, and Analytics!
Figure 1.1 Fraud Triangle
Figure 1.2 Fire Incident Claim-Handling Process
Figure 1.3 The Fraud Cycle
Figure 1.4 Outlier Detection at the Data Item Level
Figure 1.5 Outlier Detection at the Data Set Level
Figure 1.6 The Fraud Analytics Process Model
Figure 1.7 Profile of a Fraud Data Scientist
Figure 1.8 Screenshot of Web of Science Statistics for Scientific Publications on Fraud between 1996 and 2014
Chapter 2: Data Collection, Sampling, and Preprocessing
Figure 2.1 Aggregating Normalized Data Tables into a Non-Normalized Data Table
Figure 2.2 Pie Charts for Exploratory Data Analysis
Figure 2.3 Benford's Law Describing the Frequency Distribution of the First Digit
Figure 2.4 Multivariate Outliers
Figure 2.5 Histogram for Outlier Detection
Figure 2.6 Box Plots for Outlier Detection
Figure 2.7 Using the
z
-Scores for Truncation
Figure 2.8 Default Risk Versus Age
Figure 2.9 Illustration of Principal Component Analysis in a Two-Dimensional Data Set
Chapter 3: Descriptive Analytics for Fraud Detection
Figure 3.1 3D Scatter Plot for Detecting Outliers
Figure 3.2 OLAP Cube for Fraud Detection
Figure 3.3 Example Pivot Table for Credit Card Fraud Detection
Figure 3.4 Break-Point Analysis
Figure 3.5 Peer-Group Analysis
Figure 3.6 Cluster Analysis for Fraud Detection
Figure 3.7 Hierarchical Versus Nonhierarchical Clustering Techniques
Figure 3.8 Euclidean Versus Manhattan Distance
Figure 3.9 Divisive Versus Agglomerative Hierarchical Clustering
Figure 3.10 Calculating Distances between Clusters
Figure 3.11 Example for Clustering Birds. The Numbers Indicate the Clustering Steps
Figure 3.12 Dendrogram for Birds Example. The Thick Black Line Indicates the Optimal Clustering
Figure 3.13 Screen Plot for Clustering
Figure 3.14 Scatter Plot of Hierarchical Clustering Data
Figure 3.15 Output of Hierarchical Clustering Procedures
Figure 3.16
k
-Means Clustering: Start from Original Data
Figure 3.17
k
-Means Clustering Iteration 1: Randomly Select Initial Cluster Centroids
Figure 3.18
k
-Means Clustering Iteration 1: Assign Remaining Observations
Figure 3.19
k
-Means Iteration Step 2: Recalculate Cluster Centroids
Figure 3.20
k
-Means Clustering Iteration 2: Reassign Observations
Figure 3.21
k
-Means Clustering Iteration 3: Recalculate Cluster Centroids
Figure 3.22
k
-Means Clustering Iteration 3: Reassign Observations
Figure 3.23 Rectangular Versus Hexagonal SOM Grid
Figure 3.24 Clustering Countries Using SOMs
Figure 3.25 Component Plane for Literacy
Figure 3.26 Component Plane for Political Rights
Figure 3.27 Must-Link and Cannot-Link Constraints in Semi-Supervised Clustering
Figure 3.28
δ
-Constraints in Semi-Supervised Clustering
Figure 3.29
ε
-Constraints in Semi-Supervised Clustering
Figure 3.30 Cluster Profiling Using Histograms
Figure 3.31 Using Decision Trees for Clustering Interpretation
Figure 3.32 One-Class Support Vector Machines
Chapter 4: Predictive Analytics for Fraud Detection
Figure 4.1 A Spider Construction in Tax Evasion Fraud
Figure 4.2 Regular Versus Fraudulent Bankruptcy
Figure 4.3 OLS Regression
Figure 4.4 Bounding Function for Logistic Regression
Figure 4.5 Linear Decision Boundary of Logistic Regression
Figure 4.6 Other Transformations
Figure 4.7 Fraud Detection Scorecard
Figure 4.8 Calculating the
p
-Value with a Student's
t
-Distribution
Figure 4.9 Variable Subsets for Four Variables
V
1
, V
2
, V
3
, and V
4
Figure 4.10 Example Decision Tree
Figure 4.11 Example Data Sets for Calculating Impurity
Figure 4.12 Entropy Versus Gini
Figure 4.13 Calculating the Entropy for Age Split
Figure 4.14 Using a Validation Set to Stop Growing a Decision Tree
Figure 4.15 Decision Boundary of a Decision Tree
Figure 4.16 Example Regression Tree for Predicting the Fraud Percentage
Figure 4.17 Neural Network Representation of Logistic Regression
Figure 4.18 A Multilayer Perceptron (MLP) Neural Network
Figure 4.19 Local Versus Global Minima
Figure 4.20 Using a Validation Set for Stopping Neural Network Training
Figure 4.21 Example Hinton Diagram
Figure 4.22 Backward Variable Selection
Figure 4.23 Decompositional Approach for Neural Network Rule Extraction
Figure 4.24 Pedagogical Approach for Rule Extraction
Figure 4.25 Two-Stage Models
Figure 4.26 Multiple Separating Hyperplanes
Figure 4.27 SVM Classifier for the Perfectly Linearly Separable Case
Figure 4.28 SVM Classifier in Case of Overlapping Distributions
Figure 4.29 The Feature Space Mapping
Figure 4.30 SVMs for Regression
Figure 4.31 Representing an SVM Classifier as a Neural Network
Figure 4.32 One-Versus-One Coding for Multiclass Problems
Figure 4.33 One-Versus-All Coding for Multiclass Problems
Figure 4.34 Training Versus Test Sample Set Up for Performance Estimation
Figure 4.35 Cross-Validation for Performance Measurement
Figure 4.36 Bootstrapping
Figure 4.37 Calculating Predictions Using a Cut-Off
Figure 4.38 The Receiver Operating Characteristic Curve
Figure 4.39 Lift Curve
Figure 4.40 Cumulative Accuracy Profile
Figure 4.41 Calculating the Accuracy Ratio
Figure 4.42 The Kolmogorov-Smirnov Statistic
Figure 4.43 A Cumulative Notch Difference Graph
Figure 4.44 Scatter Plot: Predicted Fraud Versus Actual Fraud
Figure 4.45 CAP Curve for Continuous Targets
Figure 4.46 Regression Error Characteristic (REC) Curve
Figure 4.47 Varying the Time Window to Deal with Skewed Data Sets
Figure 4.48 Oversampling the Fraudsters
Figure 4.49 Undersampling the Nonfraudsters
Figure 4.50 Synthetic Minority Oversampling Technique (SMOTE)
Chapter 5: Social Network Analysis for Fraud Detection
Figure 5.1a Köningsberg Bridges
Figure 5.1b Schematic Representation of the Köningsberg Bridges
Figure 5.2 Identity Theft. The Frequent Contact List of a Person is Suddenly Extended with Other Contacts (Light Gray Nodes). This Might Indicate that a Fraudster (Dark Gray Node) Took Over that Customer's Account and “shares” his/her Contacts
Figure 5.3 Network Representation
Figure 5.4 Example of a (Un)Directed Graph
Figure 5.5 Follower–Followee Relationships in a Twitter Network
Figure 5.6 Edge Representation
Figure 5.7 Example of a Fraudulent Network
Figure 5.8 An Egonet. The Ego is Surrounded by Six Alters, of Whom Two are Legitimate (White Nodes) and Four are Fraudulent (Gray Nodes)
Figure 5.9 Toy Example of Credit Card Fraud
Figure 5.10 Mathematical Representation of (a) a Sample Network: (b) the Adjacency or Connectivity Matrix; (c) the Weight Matrix; (d) the Adjacency List; and (e) the Weight List
Figure 5.11 A Real-Life Example of a Homophilic Network
Figure 5.12 A Homophilic Network
Figure 5.13 Sample Network
Figure 5.14a Degree Distribution
Figure 5.14b Illustration of the Degree Distribution for a Real-Life Network of Social Security Fraud. The Degree Distribution Follows a Power Law (log-log axes)
Figure 5.15 A
4
-regular Graph
Figure 5.16 Example Social Network for a Relational Neighbor Classifier
Figure 5.17 Example Social Network for a Probabilistic Relational Neighbor Classifier
Figure 5.18 Example of Social Network Features for a Relational Logistic Regression Classifier
Figure 5.19 Example of Featurization with Features Describing Intrinsic Behavior and Behavior of the Neighborhood
Figure 5.20 Illustration of Dijkstra's Algorithm
Figure 5.21 Illustration of the Number of Connecting Paths Between Two Nodes
Figure 5.22 Illustration of Betweenness Between Communities of Nodes
Figure 5.23 Pagerank Algorithm
Figure 5.24 Illustration of Iterative Process of the PageRank Algorithm
Figure 5.25 Sample Network
Figure 5.26 Community Detection for Credit Card Fraud
Figure 5.27 Iterative Bisection
Figure 5.28 Dendrogram of the Clustering of Figure 5.27 by the Girvan-Newman Algorithm. The Modularity
Q
is Maximized When Splitting the Network into Two Communities ABC –DEFG
Figure 5.29 Complete (a) and Partial (b) Communities
Figure 5.30 Overlapping Communities
Figure 5.31 Unipartite Graph
Figure 5.32 Bipartite Graph
Figure 5.33 Connectivity Matrix of a Bipartite Graph
Figure 5.34 A Multipartite Graph
Figure 5.35 Sample Network of Gotcha!
Figure 5.36 Exposure Score of the Resources Derived by a Propagation Algorithm. The Results are Based on a Real-life Data Set in Social Security Fraud
Figure 5.37 Egonet in Social Security Fraud. A Company Is Associated with its Resources
Figure 5.38 ROC Curve of the Gotcha! Model, which Combines both Intrinsic and Relational Features
Chapter 6: Fraud Analytics: Post-Processing
Figure 6.1 The Analytical Model Life Cycle
Figure 6.2 Traffic Light Indicator Approach
Figure 6.3 SAS Social Network Analysis Dashboard
Figure 6.4 SAS Social Network Analysis Claim Detail Investigation
Figure 6.5 SAS Social Network Analysis Link Detection
Figure 6.6 Distribution of Claim Amounts and Average Claim Value
Figure 6.7 Geographical Distribution of Claims
Figure 6.8 Zooming into the Geographical Distribution of Claims
Figure 6.9 Measuring the Efficiency of the Fraud-Detection Process
Figure 6.10 Evaluating the Efficiency of Fraud Investigators
Chapter 7: Fraud Analytics: A Broader Perspective
Figure 7.1 RACI Matrix
Figure 7.2 Anonymizing a Database
Figure 7.3 Different SQL Views Defined for a Database
Figure 7.4 Aggregate Loss Distribution with Indication of Expected Loss, Value at Risk (VaR) at 99.9 Percent Confidence Level and Unexpected Loss
Figure 7.5 Snapshot of a Credit Card Fraud Time Series Data Set and Associated Histogram of the Fraud Amounts
Figure 7.6 Aggregate Loss Distribution Resulting from a Monte Carlo Simulation with Poisson Distributed Monthly Fraud Frequency and Associated Pareto Distributed Fraud Loss
Chapter 1: Fraud: Detection, Prevention, and Analytics!
Table 1.1 Nonexhaustive List of Fraud Categories and Types
Table 1.2 Call Detail Records of a Customer with Outliers Indicating Suspicious Activity (deviating behavior starting at a certain moment in time) at the Customer Subscription (Fawcett and Provost 1997)
Table 1.3 Example Credit Card Transaction Data Fields
Table 1.4 Key Characteristics of Successful Fraud Analytics Models
Chapter 2: Data Collection, Sampling, and Preprocessing
Table 2.1 Dealing with Missing Values
Table 2.2
z
-Scores for Outlier Detection
Table 2.3 Coarse Classifying the Product Type Variable
Table 2.4 Pivot Table for Coarse Classifying the Product Type Variable
Table 2.5 Coarse Classifying the Product Type Variable
Table 2.6 Empirical Frequencies Option 1 for Coarse Classifying Product Type
Table 2.7 Independence Frequencies Option 1 for Coarse Classifying Product Type
Table 2.8 Calculating Weights of Evidence (WOE)
Table 2.9 Filters for Variable Selection
Table 2.10 Calculating the Information Value Filter Measure
Table 2.11 Contingency Table for Marital Status versus Good/Bad Customer
Chapter 3: Descriptive Analytics for Fraud Detection
Table 3.1 Transaction Data Set for Peer-Group Analysis
Table 3.2 Transactions Database for Insurance Fraud Detection
Table 3.3 Data Set for Hierarchical Clustering
Table 3.4 Output from a
k
-Means Clustering Exercise (
k
=4)
Chapter 4: Predictive Analytics for Fraud Detection
Table 4.1 Data Set for Linear Regression
Table 4.2 Example Classification Data Set
Table 4.3 Reference Values for Variable Significance
Table 4.4 Example Data Set for Performance Calculation
Table 4.5 Confusion Matrix
Table 4.6 Table for ROC Analysis
Table 4.7 Multiclass Confusion Matrix
Table 4.8 Data for REC Curve
Table 4.9 Values for
PF
A
for a Data Set with No Fraudsters
Table 4.10 Values for
PF
B
for a Data Set with No Fraudsters
Table 4.11 Values for
PF
C
for a Data Set with No Fraudsters
Table 4.12 Values for
PF
A
,
PF
B,
and
PF
C
for a Data Set with Fraudsters
Table 4.13 Adjusting the Posterior Probability
Table 4.14 Misclassification Costs
Table 4.15 Performance Benchmarks for Fraud Detection
Chapter 5: Social Network Analysis for Fraud Detection
Table 5.1 Example of Credit Card Transaction Data
Table 5.2 Overview of Neighborhood Metrics
Table 5.3 Summary of the Total, Fraudulent, and Legitimate Degree
Table 5.4 Summary of the Number of Legitimate, Fraudulent, and Semi-Fraudulent Triangles
Table 5.5 Summary of the Density
Table 5.6 Summary of Relational Neighbor Probabilities
Table 5.7 Summary of Relational Features by Lu and Getoor (2003)
Table 5.8 Centrality Metrics
Table 5.9 Summary of Geodesic Paths to Fraudulent Nodes
Table 5.10 Summary of Closeness and Closeness Centrality for Each Node of the Network in Figure 5.13
Table 5.11 Summary of the Betweenness Centrality for Each Node of the Network in Figure 5.13
Table 5.12 PageRank Algorithm
Table 5.13 Featurization Process. The Unstructured Network Is Mapped into Structured Data Variables
Table 5.14 Overview of the 100 Companies with the Highest Score as Output by the Detection Model
Chapter 6: Fraud Analytics: Post-Processing
Table 6.1 Fully Expanded Decision Table
Table 6.2 Contracted Decision Table
Table 6.3 Minimized Decision Table
Table 6.4 Decision Table for Rule Verification
Table 6.5 Using the Expected Fraud Amount to Decide on Further Investigation
Table 6.6 Calculating the System Stability Index (SSI)
Table 6.7 Monitoring the SSI through Time
Table 6.8 Calculating the SSI for Individual Variables
Table 6.9 Monitoring the Performance Metric of a Fraud Model
Table 6.10 Monitoring the Calibration of a Classification Model
Table 6.11 Monitoring the Calibration of a Regression Model
Chapter 7: Fraud Analytics: A Broader Perspective
Table 7.1 Example Costs for Calculating Total Cost of Ownership (TCO)
The Wiley & SAS Business Series presents books that help senior-level managers with their critical management decisions.
Titles in the Wiley & SAS Business Series include:
Analytics in a Big Data World: The Essential Guide to Data Science and Its Applications
by Bart Baesens
Bank Fraud: Using Technology to Combat Losses
by Revathi Subramanian
Big Data Analytics: Turning Big Data into Big Money
by Frank Ohlhorst
Big Data, Big Innovation: Enabling Competitive Differentiation through Business Analytics
by Evan Stubbs
Business Analytics for Customer Intelligence
by Gert Laursen
Business Intelligence Applied: Implementing an Effective Information and Communications Technology Infrastructure
by Michael Gendron
Business Intelligence and the Cloud: Strategic Implementation Guide
by Michael S. Gendron
Business Transformation: A Roadmap for Maximizing Organizational Insights
by Aiman Zeid
Connecting Organizational Silos: Taking Knowledge Flow Management to the Next Level with Social Media
by Frank Leistner
Data-Driven Healthcare: How Analytics and BI Are Transforming the Industry
by Laura Madsen
Delivering Business Analytics: Practical Guidelines for Best Practice
by Evan Stubbs
Demand-Driven Forecasting: A Structured Approach to Forecasting,
Second Edition by Charles Chase
Demand-Driven Inventory Optimization and Replenishment: Creating a More Efficient Supply Chain
by Robert A. Davis
Developing Human Capital: Using Analytics to Plan and Optimize Your Learning and Development Investments
by Gene Pease, Barbara Beresford, and Lew Walker
The Executive's Guide to Enterprise Social Media Strategy: How Social Networks Are Radically Transforming Your Business
by David Thomas and Mike Barlow
Economic and Business Forecasting: Analyzing and Interpreting Econometric Results
by John Silvia, Azhar Iqbal, Kaylyn Swankoski, Sarah Watt, and Sam Bullard
Financial Institution Advantage and The Optimization of Information Processing
by Sean C. Keenan
Foreign Currency Financial Reporting from Euros to Yen to Yuan: A Guide to Fundamental Concepts and Practical Applications
by Robert Rowan
Harness Oil and Gas Big Data with Analytics: Optimize Exploration and Production with Data Driven Models
by Keith Holdaway
Health Analytics: Gaining the Insights to Transform Health Care
by Jason Burke
Heuristics in Analytics: A Practical Perspective of What Influences Our Analytical World
by Carlos Andre Reis Pinheiro and Fiona McNeill
Human Capital Analytics: How to Harness the Potential of Your Organization's Greatest Asset
by Gene Pease, Boyce Byerly, and Jac Fitz-enz
Implement, Improve and Expand Your Statewide Longitudinal Data System: Creating a Culture of Data in Education
by Jamie McQuiggan and Armistead Sapp
Killer Analytics: Top 20 Metrics Missing from Your Balance Sheet
by Mark Brown
Predictive Analytics for Human Resources
by Jac Fitz-enz and John Mattox II
Predictive Business Analytics: Forward-Looking Capabilities to Improve Business Performance
by Lawrence Maisel and Gary Cokins
Retail Analytics: The Secret Weapon
by Emmett Cox
Social Network Analysis in Telecommunications
by Carlos Andre Reis Pinheiro
Statistical Thinking: Improving Business Performance,
second edition by Roger W. Hoerl and Ronald D. Snee
Taming the Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with Advanced Analytics
by Bill Franks
Too Big to Ignore: The Business Case for Big Data
by Phil Simon
The Value of Business Analytics: Identifying the Path to Profitability
by Evan Stubbs
The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions
by Phil Simon
Understanding the Predictive Analytics Lifecycle
by Al Cordoba
Unleashing Your Inner Leader: An Executive Coach Tells All
by Vickie Bevenour
Using Big Data Analytics: Turning Big Data into Big Money
by Jared Dean
Win with Advanced Business Analytics: Creating Business Value from Your Data
by Jean Paul Isson and Jesse Harriott
For more information on any of the above titles, please visit www.wiley.com.
Bart Baesens
Véronique Van Vlasselaer
Wouter Verbeke
Copyright © 2015 by John Wiley & Sons, Inc. All rights reserved.
Published by John Wiley & Sons, Inc., Hoboken, New Jersey.
Published simultaneously in Canada.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600, or on the Web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.
Wiley publishes in a variety of print and electronic formats and by print-on-demand. Some material included with standard print versions of this book may not be included in e-books or in print-on-demand. If this book refers to media such as a CD or DVD that is not included in the version you purchased, you may download this material at http://booksupport.wiley.com. For more information about Wiley products, visit www.wiley.com.
Library of Congress Cataloging-in-Publication Data:
Baesens, Bart.
Fraud analytics using descriptive, predictive, and social network techniques : a guide to data science for fraud detection / Bart Baesens, Veronique Van Vlasselaer, Wouter Verbeke.
pages cm. — (Wiley & SAS business series)
Includes bibliographical references and index.
ISBN 978-1-119-13312-4 (cloth) — ISBN 978-1-119-14682-7 (epdf) — ISBN 978-1-119-14683-4 (epub)
1. Fraud— Statistical methods. 2. Fraud— Prevention. 3. Commercial crimes— Prevention. I. Title.
HV6691.B34 2015
364.16′3015195—dc23
2015017861
Cover Design: Wiley
Cover Image: ©iStock.com/aleksandarvelasevic
To my wonderful wife, Katrien, and kids, Ann-Sophie, Victor, and Hannelore.
To my parents and parents-in-law.
To my husband and soul mate, Niels, for his never-ending support.
To my parents, parents-in-law, and siblings-in-law.
To Luit and Titus.
Figure 1.1
Fraud Triangle
Figure 1.2
Fire Incident Claim-Handling Process
Figure 1.3
The Fraud Cycle
Figure 1.4
Outlier Detection at the Data Item Level
Figure 1.5
Outlier Detection at the Data Set Level
Figure 1.6
The Fraud Analytics Process Model
Figure 1.7
Profile of a Fraud Data Scientist
Figure 1.8
Screenshot of Web of Science Statistics for Scientific Publications on Fraud between 1996 and 2014
Figure 2.1
Aggregating Normalized Data Tables into a Non-Normalized Data Table
Figure 2.2
Pie Charts for Exploratory Data Analysis
Figure 2.3
Benford's Law Describing the Frequency Distribution of the First Digit
Figure 2.4
Multivariate Outliers
Figure 2.5
Histogram for Outlier Detection
Figure 2.6
Box Plots for Outlier Detection
Figure 2.7
Using the
z
-Scores for Truncation
Figure 2.8
Default Risk Versus Age
Figure 2.9
Illustration of Principal Component Analysis in a Two-Dimensional Data Set
Figure 3.1
3D Scatter Plot for Detecting Outliers
Figure 3.2
OLAP Cube for Fraud Detection
Figure 3.3
Example Pivot Table for Credit Card Fraud Detection
Figure 3.4
Break-Point Analysis
Figure 3.5
Peer-Group Analysis
Figure 3.6
Cluster Analysis for Fraud Detection
Figure 3.7
Hierarchical Versus Nonhierarchical Clustering Techniques
Figure 3.8
Euclidean Versus Manhattan Distance
Figure 3.9
Divisive Versus Agglomerative Hierarchical Clustering
Figure 3.10
Calculating Distances between Clusters
Figure 3.11
Example for Clustering Birds. The Numbers Indicate the Clustering Steps
Figure 3.12
Dendrogram for Birds Example. The Thick Black Line Indicates the Optimal Clustering
Figure 3.13
Screen Plot for Clustering
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!