Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques - Bart Baesens - E-Book

Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques E-Book

Bart Baesens

5,0
33,99 €

oder
-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

Detect fraud earlier to mitigate loss and prevent cascading damage Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques is an authoritative guidebook for setting up a comprehensive fraud detection analytics solution. Early detection is a key factor in mitigating fraud damage, but it involves more specialized techniques than detecting fraud at the more advanced stages. This invaluable guide details both the theory and technical aspects of these techniques, and provides expert insight into streamlining implementation. Coverage includes data gathering, preprocessing, model building, and post-implementation, with comprehensive guidance on various learning techniques and the data types utilized by each. These techniques are effective for fraud detection across industry boundaries, including applications in insurance fraud, credit card fraud, anti-money laundering, healthcare fraud, telecommunications fraud, click fraud, tax evasion, and more, giving you a highly practical framework for fraud prevention. It is estimated that a typical organization loses about 5% of its revenue to fraud every year. More effective fraud detection is possible, and this book describes the various analytical techniques your organization must implement to put a stop to the revenue leak. * Examine fraud patterns in historical data * Utilize labeled, unlabeled, and networked data * Detect fraud before the damage cascades * Reduce losses, increase recovery, and tighten security The longer fraud is allowed to go on, the more harm it causes. It expands exponentially, sending ripples of damage throughout the organization, and becomes more and more complex to track, stop, and reverse. Fraud prevention relies on early and effective fraud detection, enabled by the techniques discussed here. Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques helps you stop fraud in its tracks, and eliminate the opportunities for future occurrence.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 479

Veröffentlichungsjahr: 2015

Bewertungen
5,0 (18 Bewertungen)
18
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Title Page

Copyright

Dedication

List of Figures

Foreword

Preface

Acknowledgments

Chapter 1: Fraud: Detection, Prevention, and Analytics!

Introduction

Fraud!

Fraud Detection and Prevention

Big Data for Fraud Detection

Data-Driven Fraud Detection

Fraud-Detection Techniques

Fraud Cycle

The Fraud Analytics Process Model

Fraud Data Scientists

A Scientific Perspective on Fraud

References

Chapter 2: Data Collection, Sampling, and Preprocessing

Introduction

Types of Data Sources

Merging Data Sources

Sampling

Types of Data Elements

Visual Data Exploration and Exploratory Statistical Analysis

Benford's Law

Descriptive Statistics

Missing Values

Outlier Detection and Treatment

Red Flags

Standardizing Data

Categorization

Weights of Evidence Coding

Variable Selection

Principal Components Analysis

RIDITs

PRIDIT Analysis

Segmentation

References

Chapter 3: Descriptive Analytics for Fraud Detection

Introduction

Graphical Outlier Detection Procedures

Statistical Outlier Detection Procedures

Clustering

One-Class SVMs

References

Chapter 4: Predictive Analytics for Fraud Detection

Introduction

Target Definition

Linear Regression

Logistic Regression

Variable Selection for Linear and Logistic Regression

Decision Trees

Neural Networks

Support Vector Machines

Ensemble Methods

Multiclass Classification Techniques

Evaluating Predictive Models

Other Performance Measures for Predictive Analytical Models

Developing Predictive Models for Skewed Data Sets

Fraud Performance Benchmarks

References

Chapter 5: Social Network Analysis for Fraud Detection

Networks: Form, Components, Characteristics, and Their Applications

Is Fraud a Social Phenomenon? An Introduction to Homophily

Impact of the Neighborhood: Metrics

Community Mining: Finding Groups of Fraudsters

Extending the Graph: Toward a Bipartite Representation

References

Chapter 6: Fraud Analytics: Post-Processing

Introduction

The Analytical Fraud Model Life Cycle

Model Representation

Selecting the Sample to Investigate

Fraud Alert and Case Management

Visual Analytics

Backtesting Analytical Fraud Models

Model Design and Documentation

References

Chapter 7: Fraud Analytics: A Broader Perspective

Introduction

Data Quality

Privacy

Capital Calculation for Fraud Loss

An Economic Perspective on Fraud Analytics

In Versus Outsourcing

Modeling Extensions

The Internet of Things

Corporate Fraud Governance

References

About the Authors

Index

End User License Agreement

Pages

i

ii

iii

v

vi

xv

xvi

xvii

xviii

xix

xx

xxi

xxii

xxiii

xxv

xxvi

xxvii

xxix

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

49

48

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

81

80

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

256

257

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

Guide

Cover

Table of Contents

Foreword

Preface

Begin Reading

List of Illustrations

Chapter 1: Fraud: Detection, Prevention, and Analytics!

Figure 1.1 Fraud Triangle

Figure 1.2 Fire Incident Claim-Handling Process

Figure 1.3 The Fraud Cycle

Figure 1.4 Outlier Detection at the Data Item Level

Figure 1.5 Outlier Detection at the Data Set Level

Figure 1.6 The Fraud Analytics Process Model

Figure 1.7 Profile of a Fraud Data Scientist

Figure 1.8 Screenshot of Web of Science Statistics for Scientific Publications on Fraud between 1996 and 2014

Chapter 2: Data Collection, Sampling, and Preprocessing

Figure 2.1 Aggregating Normalized Data Tables into a Non-Normalized Data Table

Figure 2.2 Pie Charts for Exploratory Data Analysis

Figure 2.3 Benford's Law Describing the Frequency Distribution of the First Digit

Figure 2.4 Multivariate Outliers

Figure 2.5 Histogram for Outlier Detection

Figure 2.6 Box Plots for Outlier Detection

Figure 2.7 Using the

z

-Scores for Truncation

Figure 2.8 Default Risk Versus Age

Figure 2.9 Illustration of Principal Component Analysis in a Two-Dimensional Data Set

Chapter 3: Descriptive Analytics for Fraud Detection

Figure 3.1 3D Scatter Plot for Detecting Outliers

Figure 3.2 OLAP Cube for Fraud Detection

Figure 3.3 Example Pivot Table for Credit Card Fraud Detection

Figure 3.4 Break-Point Analysis

Figure 3.5 Peer-Group Analysis

Figure 3.6 Cluster Analysis for Fraud Detection

Figure 3.7 Hierarchical Versus Nonhierarchical Clustering Techniques

Figure 3.8 Euclidean Versus Manhattan Distance

Figure 3.9 Divisive Versus Agglomerative Hierarchical Clustering

Figure 3.10 Calculating Distances between Clusters

Figure 3.11 Example for Clustering Birds. The Numbers Indicate the Clustering Steps

Figure 3.12 Dendrogram for Birds Example. The Thick Black Line Indicates the Optimal Clustering

Figure 3.13 Screen Plot for Clustering

Figure 3.14 Scatter Plot of Hierarchical Clustering Data

Figure 3.15 Output of Hierarchical Clustering Procedures

Figure 3.16

k

-Means Clustering: Start from Original Data

Figure 3.17

k

-Means Clustering Iteration 1: Randomly Select Initial Cluster Centroids

Figure 3.18

k

-Means Clustering Iteration 1: Assign Remaining Observations

Figure 3.19

k

-Means Iteration Step 2: Recalculate Cluster Centroids

Figure 3.20

k

-Means Clustering Iteration 2: Reassign Observations

Figure 3.21

k

-Means Clustering Iteration 3: Recalculate Cluster Centroids

Figure 3.22

k

-Means Clustering Iteration 3: Reassign Observations

Figure 3.23 Rectangular Versus Hexagonal SOM Grid

Figure 3.24 Clustering Countries Using SOMs

Figure 3.25 Component Plane for Literacy

Figure 3.26 Component Plane for Political Rights

Figure 3.27 Must-Link and Cannot-Link Constraints in Semi-Supervised Clustering

Figure 3.28

δ

-Constraints in Semi-Supervised Clustering

Figure 3.29

ε

-Constraints in Semi-Supervised Clustering

Figure 3.30 Cluster Profiling Using Histograms

Figure 3.31 Using Decision Trees for Clustering Interpretation

Figure 3.32 One-Class Support Vector Machines

Chapter 4: Predictive Analytics for Fraud Detection

Figure 4.1 A Spider Construction in Tax Evasion Fraud

Figure 4.2 Regular Versus Fraudulent Bankruptcy

Figure 4.3 OLS Regression

Figure 4.4 Bounding Function for Logistic Regression

Figure 4.5 Linear Decision Boundary of Logistic Regression

Figure 4.6 Other Transformations

Figure 4.7 Fraud Detection Scorecard

Figure 4.8 Calculating the

p

-Value with a Student's

t

-Distribution

Figure 4.9 Variable Subsets for Four Variables

V

1

, V

2

, V

3

, and V

4

Figure 4.10 Example Decision Tree

Figure 4.11 Example Data Sets for Calculating Impurity

Figure 4.12 Entropy Versus Gini

Figure 4.13 Calculating the Entropy for Age Split

Figure 4.14 Using a Validation Set to Stop Growing a Decision Tree

Figure 4.15 Decision Boundary of a Decision Tree

Figure 4.16 Example Regression Tree for Predicting the Fraud Percentage

Figure 4.17 Neural Network Representation of Logistic Regression

Figure 4.18 A Multilayer Perceptron (MLP) Neural Network

Figure 4.19 Local Versus Global Minima

Figure 4.20 Using a Validation Set for Stopping Neural Network Training

Figure 4.21 Example Hinton Diagram

Figure 4.22 Backward Variable Selection

Figure 4.23 Decompositional Approach for Neural Network Rule Extraction

Figure 4.24 Pedagogical Approach for Rule Extraction

Figure 4.25 Two-Stage Models

Figure 4.26 Multiple Separating Hyperplanes

Figure 4.27 SVM Classifier for the Perfectly Linearly Separable Case

Figure 4.28 SVM Classifier in Case of Overlapping Distributions

Figure 4.29 The Feature Space Mapping

Figure 4.30 SVMs for Regression

Figure 4.31 Representing an SVM Classifier as a Neural Network

Figure 4.32 One-Versus-One Coding for Multiclass Problems

Figure 4.33 One-Versus-All Coding for Multiclass Problems

Figure 4.34 Training Versus Test Sample Set Up for Performance Estimation

Figure 4.35 Cross-Validation for Performance Measurement

Figure 4.36 Bootstrapping

Figure 4.37 Calculating Predictions Using a Cut-Off

Figure 4.38 The Receiver Operating Characteristic Curve

Figure 4.39 Lift Curve

Figure 4.40 Cumulative Accuracy Profile

Figure 4.41 Calculating the Accuracy Ratio

Figure 4.42 The Kolmogorov-Smirnov Statistic

Figure 4.43 A Cumulative Notch Difference Graph

Figure 4.44 Scatter Plot: Predicted Fraud Versus Actual Fraud

Figure 4.45 CAP Curve for Continuous Targets

Figure 4.46 Regression Error Characteristic (REC) Curve

Figure 4.47 Varying the Time Window to Deal with Skewed Data Sets

Figure 4.48 Oversampling the Fraudsters

Figure 4.49 Undersampling the Nonfraudsters

Figure 4.50 Synthetic Minority Oversampling Technique (SMOTE)

Chapter 5: Social Network Analysis for Fraud Detection

Figure 5.1a Köningsberg Bridges

Figure 5.1b Schematic Representation of the Köningsberg Bridges

Figure 5.2 Identity Theft. The Frequent Contact List of a Person is Suddenly Extended with Other Contacts (Light Gray Nodes). This Might Indicate that a Fraudster (Dark Gray Node) Took Over that Customer's Account and “shares” his/her Contacts

Figure 5.3 Network Representation

Figure 5.4 Example of a (Un)Directed Graph

Figure 5.5 Follower–Followee Relationships in a Twitter Network

Figure 5.6 Edge Representation

Figure 5.7 Example of a Fraudulent Network

Figure 5.8 An Egonet. The Ego is Surrounded by Six Alters, of Whom Two are Legitimate (White Nodes) and Four are Fraudulent (Gray Nodes)

Figure 5.9 Toy Example of Credit Card Fraud

Figure 5.10 Mathematical Representation of (a) a Sample Network: (b) the Adjacency or Connectivity Matrix; (c) the Weight Matrix; (d) the Adjacency List; and (e) the Weight List

Figure 5.11 A Real-Life Example of a Homophilic Network

Figure 5.12 A Homophilic Network

Figure 5.13 Sample Network

Figure 5.14a Degree Distribution

Figure 5.14b Illustration of the Degree Distribution for a Real-Life Network of Social Security Fraud. The Degree Distribution Follows a Power Law (log-log axes)

Figure 5.15 A

4

-regular Graph

Figure 5.16 Example Social Network for a Relational Neighbor Classifier

Figure 5.17 Example Social Network for a Probabilistic Relational Neighbor Classifier

Figure 5.18 Example of Social Network Features for a Relational Logistic Regression Classifier

Figure 5.19 Example of Featurization with Features Describing Intrinsic Behavior and Behavior of the Neighborhood

Figure 5.20 Illustration of Dijkstra's Algorithm

Figure 5.21 Illustration of the Number of Connecting Paths Between Two Nodes

Figure 5.22 Illustration of Betweenness Between Communities of Nodes

Figure 5.23 Pagerank Algorithm

Figure 5.24 Illustration of Iterative Process of the PageRank Algorithm

Figure 5.25 Sample Network

Figure 5.26 Community Detection for Credit Card Fraud

Figure 5.27 Iterative Bisection

Figure 5.28 Dendrogram of the Clustering of Figure 5.27 by the Girvan-Newman Algorithm. The Modularity

Q

is Maximized When Splitting the Network into Two Communities ABC –DEFG

Figure 5.29 Complete (a) and Partial (b) Communities

Figure 5.30 Overlapping Communities

Figure 5.31 Unipartite Graph

Figure 5.32 Bipartite Graph

Figure 5.33 Connectivity Matrix of a Bipartite Graph

Figure 5.34 A Multipartite Graph

Figure 5.35 Sample Network of Gotcha!

Figure 5.36 Exposure Score of the Resources Derived by a Propagation Algorithm. The Results are Based on a Real-life Data Set in Social Security Fraud

Figure 5.37 Egonet in Social Security Fraud. A Company Is Associated with its Resources

Figure 5.38 ROC Curve of the Gotcha! Model, which Combines both Intrinsic and Relational Features

Chapter 6: Fraud Analytics: Post-Processing

Figure 6.1 The Analytical Model Life Cycle

Figure 6.2 Traffic Light Indicator Approach

Figure 6.3 SAS Social Network Analysis Dashboard

Figure 6.4 SAS Social Network Analysis Claim Detail Investigation

Figure 6.5 SAS Social Network Analysis Link Detection

Figure 6.6 Distribution of Claim Amounts and Average Claim Value

Figure 6.7 Geographical Distribution of Claims

Figure 6.8 Zooming into the Geographical Distribution of Claims

Figure 6.9 Measuring the Efficiency of the Fraud-Detection Process

Figure 6.10 Evaluating the Efficiency of Fraud Investigators

Chapter 7: Fraud Analytics: A Broader Perspective

Figure 7.1 RACI Matrix

Figure 7.2 Anonymizing a Database

Figure 7.3 Different SQL Views Defined for a Database

Figure 7.4 Aggregate Loss Distribution with Indication of Expected Loss, Value at Risk (VaR) at 99.9 Percent Confidence Level and Unexpected Loss

Figure 7.5 Snapshot of a Credit Card Fraud Time Series Data Set and Associated Histogram of the Fraud Amounts

Figure 7.6 Aggregate Loss Distribution Resulting from a Monte Carlo Simulation with Poisson Distributed Monthly Fraud Frequency and Associated Pareto Distributed Fraud Loss

List of Tables

Chapter 1: Fraud: Detection, Prevention, and Analytics!

Table 1.1 Nonexhaustive List of Fraud Categories and Types

Table 1.2 Call Detail Records of a Customer with Outliers Indicating Suspicious Activity (deviating behavior starting at a certain moment in time) at the Customer Subscription (Fawcett and Provost 1997)

Table 1.3 Example Credit Card Transaction Data Fields

Table 1.4 Key Characteristics of Successful Fraud Analytics Models

Chapter 2: Data Collection, Sampling, and Preprocessing

Table 2.1 Dealing with Missing Values

Table 2.2

z

-Scores for Outlier Detection

Table 2.3 Coarse Classifying the Product Type Variable

Table 2.4 Pivot Table for Coarse Classifying the Product Type Variable

Table 2.5 Coarse Classifying the Product Type Variable

Table 2.6 Empirical Frequencies Option 1 for Coarse Classifying Product Type

Table 2.7 Independence Frequencies Option 1 for Coarse Classifying Product Type

Table 2.8 Calculating Weights of Evidence (WOE)

Table 2.9 Filters for Variable Selection

Table 2.10 Calculating the Information Value Filter Measure

Table 2.11 Contingency Table for Marital Status versus Good/Bad Customer

Chapter 3: Descriptive Analytics for Fraud Detection

Table 3.1 Transaction Data Set for Peer-Group Analysis

Table 3.2 Transactions Database for Insurance Fraud Detection

Table 3.3 Data Set for Hierarchical Clustering

Table 3.4 Output from a

k

-Means Clustering Exercise (

k

=4)

Chapter 4: Predictive Analytics for Fraud Detection

Table 4.1 Data Set for Linear Regression

Table 4.2 Example Classification Data Set

Table 4.3 Reference Values for Variable Significance

Table 4.4 Example Data Set for Performance Calculation

Table 4.5 Confusion Matrix

Table 4.6 Table for ROC Analysis

Table 4.7 Multiclass Confusion Matrix

Table 4.8 Data for REC Curve

Table 4.9 Values for

PF

A

for a Data Set with No Fraudsters

Table 4.10 Values for

PF

B

for a Data Set with No Fraudsters

Table 4.11 Values for

PF

C

for a Data Set with No Fraudsters

Table 4.12 Values for

PF

A

,

PF

B,

and

PF

C

for a Data Set with Fraudsters

Table 4.13 Adjusting the Posterior Probability

Table 4.14 Misclassification Costs

Table 4.15 Performance Benchmarks for Fraud Detection

Chapter 5: Social Network Analysis for Fraud Detection

Table 5.1 Example of Credit Card Transaction Data

Table 5.2 Overview of Neighborhood Metrics

Table 5.3 Summary of the Total, Fraudulent, and Legitimate Degree

Table 5.4 Summary of the Number of Legitimate, Fraudulent, and Semi-Fraudulent Triangles

Table 5.5 Summary of the Density

Table 5.6 Summary of Relational Neighbor Probabilities

Table 5.7 Summary of Relational Features by Lu and Getoor (2003)

Table 5.8 Centrality Metrics

Table 5.9 Summary of Geodesic Paths to Fraudulent Nodes

Table 5.10 Summary of Closeness and Closeness Centrality for Each Node of the Network in Figure 5.13

Table 5.11 Summary of the Betweenness Centrality for Each Node of the Network in Figure 5.13

Table 5.12 PageRank Algorithm

Table 5.13 Featurization Process. The Unstructured Network Is Mapped into Structured Data Variables

Table 5.14 Overview of the 100 Companies with the Highest Score as Output by the Detection Model

Chapter 6: Fraud Analytics: Post-Processing

Table 6.1 Fully Expanded Decision Table

Table 6.2 Contracted Decision Table

Table 6.3 Minimized Decision Table

Table 6.4 Decision Table for Rule Verification

Table 6.5 Using the Expected Fraud Amount to Decide on Further Investigation

Table 6.6 Calculating the System Stability Index (SSI)

Table 6.7 Monitoring the SSI through Time

Table 6.8 Calculating the SSI for Individual Variables

Table 6.9 Monitoring the Performance Metric of a Fraud Model

Table 6.10 Monitoring the Calibration of a Classification Model

Table 6.11 Monitoring the Calibration of a Regression Model

Chapter 7: Fraud Analytics: A Broader Perspective

Table 7.1 Example Costs for Calculating Total Cost of Ownership (TCO)

Wiley & SAS Business Series

The Wiley & SAS Business Series presents books that help senior-level managers with their critical management decisions.

Titles in the Wiley & SAS Business Series include:

Analytics in a Big Data World: The Essential Guide to Data Science and Its Applications

by Bart Baesens

Bank Fraud: Using Technology to Combat Losses

by Revathi Subramanian

Big Data Analytics: Turning Big Data into Big Money

by Frank Ohlhorst

Big Data, Big Innovation: Enabling Competitive Differentiation through Business Analytics

by Evan Stubbs

Business Analytics for Customer Intelligence

by Gert Laursen

Business Intelligence Applied: Implementing an Effective Information and Communications Technology Infrastructure

by Michael Gendron

Business Intelligence and the Cloud: Strategic Implementation Guide

by Michael S. Gendron

Business Transformation: A Roadmap for Maximizing Organizational Insights

by Aiman Zeid

Connecting Organizational Silos: Taking Knowledge Flow Management to the Next Level with Social Media

by Frank Leistner

Data-Driven Healthcare: How Analytics and BI Are Transforming the Industry

by Laura Madsen

Delivering Business Analytics: Practical Guidelines for Best Practice

by Evan Stubbs

Demand-Driven Forecasting: A Structured Approach to Forecasting,

Second Edition by Charles Chase

Demand-Driven Inventory Optimization and Replenishment: Creating a More Efficient Supply Chain

by Robert A. Davis

Developing Human Capital: Using Analytics to Plan and Optimize Your Learning and Development Investments

by Gene Pease, Barbara Beresford, and Lew Walker

The Executive's Guide to Enterprise Social Media Strategy: How Social Networks Are Radically Transforming Your Business

by David Thomas and Mike Barlow

Economic and Business Forecasting: Analyzing and Interpreting Econometric Results

by John Silvia, Azhar Iqbal, Kaylyn Swankoski, Sarah Watt, and Sam Bullard

Financial Institution Advantage and The Optimization of Information Processing

by Sean C. Keenan

Foreign Currency Financial Reporting from Euros to Yen to Yuan: A Guide to Fundamental Concepts and Practical Applications

by Robert Rowan

Harness Oil and Gas Big Data with Analytics: Optimize Exploration and Production with Data Driven Models

by Keith Holdaway

Health Analytics: Gaining the Insights to Transform Health Care

by Jason Burke

Heuristics in Analytics: A Practical Perspective of What Influences Our Analytical World

by Carlos Andre Reis Pinheiro and Fiona McNeill

Human Capital Analytics: How to Harness the Potential of Your Organization's Greatest Asset

by Gene Pease, Boyce Byerly, and Jac Fitz-enz

Implement, Improve and Expand Your Statewide Longitudinal Data System: Creating a Culture of Data in Education

by Jamie McQuiggan and Armistead Sapp

Killer Analytics: Top 20 Metrics Missing from Your Balance Sheet

by Mark Brown

Predictive Analytics for Human Resources

by Jac Fitz-enz and John Mattox II

Predictive Business Analytics: Forward-Looking Capabilities to Improve Business Performance

by Lawrence Maisel and Gary Cokins

Retail Analytics: The Secret Weapon

by Emmett Cox

Social Network Analysis in Telecommunications

by Carlos Andre Reis Pinheiro

Statistical Thinking: Improving Business Performance,

second edition by Roger W. Hoerl and Ronald D. Snee

Taming the Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with Advanced Analytics

by Bill Franks

Too Big to Ignore: The Business Case for Big Data

by Phil Simon

The Value of Business Analytics: Identifying the Path to Profitability

by Evan Stubbs

The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions

by Phil Simon

Understanding the Predictive Analytics Lifecycle

by Al Cordoba

Unleashing Your Inner Leader: An Executive Coach Tells All

by Vickie Bevenour

Using Big Data Analytics: Turning Big Data into Big Money

by Jared Dean

Win with Advanced Business Analytics: Creating Business Value from Your Data

by Jean Paul Isson and Jesse Harriott

For more information on any of the above titles, please visit www.wiley.com.

Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques

A Guide to Data Science for Fraud Detection

Bart Baesens

Véronique Van Vlasselaer

Wouter Verbeke

 

Copyright © 2015 by John Wiley & Sons, Inc. All rights reserved.

Published by John Wiley & Sons, Inc., Hoboken, New Jersey.

Published simultaneously in Canada.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600, or on the Web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley publishes in a variety of print and electronic formats and by print-on-demand. Some material included with standard print versions of this book may not be included in e-books or in print-on-demand. If this book refers to media such as a CD or DVD that is not included in the version you purchased, you may download this material at http://booksupport.wiley.com. For more information about Wiley products, visit www.wiley.com.

Library of Congress Cataloging-in-Publication Data:

Baesens, Bart.

Fraud analytics using descriptive, predictive, and social network techniques : a guide to data science for fraud detection / Bart Baesens, Veronique Van Vlasselaer, Wouter Verbeke.

pages cm. — (Wiley & SAS business series)

Includes bibliographical references and index.

ISBN 978-1-119-13312-4 (cloth) — ISBN 978-1-119-14682-7 (epdf) — ISBN 978-1-119-14683-4 (epub)

1. Fraud— Statistical methods. 2. Fraud— Prevention. 3. Commercial crimes— Prevention. I. Title.

HV6691.B34 2015

364.16′3015195—dc23

2015017861

Cover Design: Wiley

Cover Image: ©iStock.com/aleksandarvelasevic

To my wonderful wife, Katrien, and kids, Ann-Sophie, Victor, and Hannelore.

To my parents and parents-in-law.

To my husband and soul mate, Niels, for his never-ending support.

To my parents, parents-in-law, and siblings-in-law.

To Luit and Titus.

List of Figures

Figure 1.1

Fraud Triangle

Figure 1.2

Fire Incident Claim-Handling Process

Figure 1.3

The Fraud Cycle

Figure 1.4

Outlier Detection at the Data Item Level

Figure 1.5

Outlier Detection at the Data Set Level

Figure 1.6

The Fraud Analytics Process Model

Figure 1.7

Profile of a Fraud Data Scientist

Figure 1.8

Screenshot of Web of Science Statistics for Scientific Publications on Fraud between 1996 and 2014

Figure 2.1

Aggregating Normalized Data Tables into a Non-Normalized Data Table

Figure 2.2

Pie Charts for Exploratory Data Analysis

Figure 2.3

Benford's Law Describing the Frequency Distribution of the First Digit

Figure 2.4

Multivariate Outliers

Figure 2.5

Histogram for Outlier Detection

Figure 2.6

Box Plots for Outlier Detection

Figure 2.7

Using the

z

-Scores for Truncation

Figure 2.8

Default Risk Versus Age

Figure 2.9

Illustration of Principal Component Analysis in a Two-Dimensional Data Set

Figure 3.1

3D Scatter Plot for Detecting Outliers

Figure 3.2

OLAP Cube for Fraud Detection

Figure 3.3

Example Pivot Table for Credit Card Fraud Detection

Figure 3.4

Break-Point Analysis

Figure 3.5

Peer-Group Analysis

Figure 3.6

Cluster Analysis for Fraud Detection

Figure 3.7

Hierarchical Versus Nonhierarchical Clustering Techniques

Figure 3.8

Euclidean Versus Manhattan Distance

Figure 3.9

Divisive Versus Agglomerative Hierarchical Clustering

Figure 3.10

Calculating Distances between Clusters

Figure 3.11

Example for Clustering Birds. The Numbers Indicate the Clustering Steps

Figure 3.12

Dendrogram for Birds Example. The Thick Black Line Indicates the Optimal Clustering

Figure 3.13

Screen Plot for Clustering

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!