A Practical Guide to Data Mining for Business and Industry - Andrea Ahlemeyer-Stubbe - E-Book

A Practical Guide to Data Mining for Business and Industry E-Book

Andrea Ahlemeyer-Stubbe

0,0
60,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Data mining is well on its way to becoming a recognized discipline in the overlapping areas of IT, statistics, machine learning, and AI. Practical Data Mining for Business presents a user-friendly approach to data mining methods, covering the typical uses to which it is applied. The methodology is complemented by case studies to create a versatile reference book, allowing readers to look for specific methods as well as for specific applications. The book is formatted to allow statisticians, computer scientists, and economists to cross-reference from a particular application or method to sectors of interest.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 460

Veröffentlichungsjahr: 2014

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



CONTENTS

Cover

Title page

Copyright page

Glossary of terms

Part I: Data mining concept

1 Introduction

1.1 Aims of the Book

1.2 Data Mining Context

1.3 Global Appeal

1.4 Example Datasets Used in This Book

1.5 Recipe Structure

1.6 Further Reading and Resources

2 Data mining definition

2.1 Types of Data Mining Questions

2.2 Data Mining Process

2.3 Business Task: Clarification of the Business Question behind the Problem

2.4 Data: Provision and Processing of the Required Data

2.5 Modelling: Analysis of the Data

2.6 Evaluation and Validation during the Analysis Stage

2.7 Application of Data Mining Results and Learning from the Experience

Part II: Data mining Practicalities

3 All about data

3.1 Some Basics

3.2 Data Partition: Random Samples for Training, Testing and Validation

3.3 Types of Business Information Systems

3.4 Data Warehouses

3.5 Three Components of a Data Warehouse: DBMS, DB and DBCS

3.6 Data Marts

3.7 A Typical Example from the Online Marketing Area

3.8 Unique Data Marts

3.9 Data Mart: Do’s and Don’ts

4 Data Preparation

4.1 Necessity of Data Preparation

4.2 From Small and Long to Short and Wide

4.3 Transformation of Variables

4.4 Missing Data and Imputation Strategies

4.5 Outliers

4.6 Dealing with the Vagaries of Data

4.7 Adjusting the Data Distributions

4.8 Binning

4.9 Timing Considerations

4.10 Operational Issues

5 Analytics

5.1 Introduction

5.2 Basis of Statistical Tests

5.3 Sampling

5.4 Basic Statistics for Pre-analytics

5.5 Feature Selection/Reduction of Variables

5.6 Time Series Analysis

6 Methods

6.1 Methods Overview

6.2 Supervised Learning

6.3 Multiple Linear Regression for Use When Target is Continuous

6.4 Regression When the Target is Not Continuous

6.5 Decision Trees

6.6 Neural Networks

6.7 Which Method Produces the Best Model? A Comparison of Regression, Decision Trees and Neural Networks

6.8 Unsupervised Learning

6.9 Cluster Analysis

6.10 Kohonen Networks and Self-Organising Maps

6.11 Group Purchase Methods: Association and Sequence Analysis

7 Validation and Application

7.1 Introduction to Methods for Validation

7.2 Lift and Gain Charts

7.3 Model Stability

7.4 Sensitivity Analysis

7.5 Threshold Analytics and Confusion Matrix

7.6 ROC Curves

7.7 Cross-Validation and Robustness

7.8 Model Complexity

Part III: Data mining in action

8 Marketing

8.1 Recipe 1: Response Optimisation: To Find and Address the Right Number of Customers

8.2 Recipe 2: To Find the x% of Customers with the Highest Affinity to an Offer

8.3 Recipe 3: To Find the Right Number of Customers to Ignore

8.4 Recipe 4: To Find the x% of Customers with the Lowest Affinity to an Offer

8.5 Recipe 5: To Find the x% of Customers with the Highest Affinity to Buy

8.6 Recipe 6: To Find the x% of Customers with the Lowest Affinity to Buy

8.7 Recipe 7: To Find the x% of Customers with the Highest Affinity to a Single Purchase

8.8 Recipe 8: To Find the x% of Customers with the Highest Affinity to Sign a Long-Term Contract in Communication Areas

8.9 Recipe 9: To Find the x% of Customers with the Highest Affinity to Sign a Long-Term Contract in Insurance Areas

9 Intra-Customer Analysis

9.1 Recipe 10: To Find the Optimal Amount of Single Communication to Activate One Customer

9.2 Recipe 11: To Find the Optimal Communication Mix to Activate One Customer

9.3 Recipe 12: To Find and Describe Homogeneous Groups of Products

9.4 Recipe 13: To Find and Describe Groups of Customers with Homogeneous Usage

9.5 Recipe 14: To Predict the Order Size of Single Products or Product Groups

9.6 Recipe 15: Product Set Combination

9.7 Recipe 16: To Predict the Future Customer Lifetime Value of a Customer

10 Learning from a Small Testing Sample and Prediction

10.1 Recipe 17: To Predict Demographic Signs (Like Sex, Age, Education and Income)

10.2 Recipe 18: To Predict the Potential Customers of a Brand New Product or Service in Your Databases

10.3 Recipe 19: To Understand Operational Features and General Business Forecasting

11 Miscellaneous

11.1 Recipe 20: To Find Customers Who Will Potentially Churn

11.2 Recipe 21: Indirect Churn Based on a Discontinued Contract

11.3 Recipe 22: Social Media Target Group Descriptions

11.4 Recipe 23: Web Monitoring

11.5 Recipe 24: To Predict Who is Likelyto Click on a Special Banner

12 Software and Tools

12.1 List of Requirements When Choosing a Data Mining Tool

12.2 Introduction to the Idea of Fully Automated Modelling (FAM)

12.3 FAM Function

12.4 FAM Architecture

12.5 FAM Data Flows and Databases

12.6 FAM Modelling Aspects

12.7 FAM Challenges and Critical Success Factors

12.8 FAM Summary

13 Overviews

13.1 To Make Use of Official Statistics

13.2 How to Use Simple Maths to Make an Impression

13.3 Differences between Statistical Analysis and Data Mining

13.4 How to Use Data Mining in Different Industries

13.5 Future Views

Bibliography

Index

End User License Agreement

List of Illustrations

Chapter 01

Figure 1.1 Data mining short process.

Figure 1.2 Increasing profit with data mining.

Figure 1.3 Example data – 50 000 sample customers and table of order details.

Figure 1.4 Example data – ENBIS Challenge.

Chapter 02

Figure 2.1 Supervised learning.

Figure 2.2 Unsupervised learning.

Figure 2.3 The general data mining process.

Figure 2.4 Time scales for data mining process.

Figure 2.5 Lift chart to compare models.

Figure 2.6 Lift chart to compare models.

Figure 2.7 Fine scale lift chart.

Figure 2.8 Confusion matrix for comparing models.

Figure 2.9 An example of model control in Excel.

Chapter 03

Figure 3.1 Important terms in data evolution.

Figure 3.2 The evolution of wisdom.

Figure 3.3 Typical internal and external data in information systems.

Figure 3.4 Table of sample data.

Figure 3.5 Data distribution.

Figure 3.6 Stratified sampling.

Figure 3.7 Example data structure.

Figure 3.8 Example data structure.

Figure 3.9 Translation list of variable names.

Figure 3.10 Example of click information.

Chapter 04

Figure 4.1 Typical connections between fact tables.

Figure 4.2 An example of an order fact table.

Figure 4.3 Person as subject – easy.

Figure 4.4 Person as subject – more information.

Figure 4.5 Interaction between measurement level and data mining method.

Figure 4.6 Example of a look-up table.

Figure 4.7 Part of a data mart with missing values.

Figure 4.8 Part of a data mart with replacement for missing values.

Figure 4.9 Dealing with binning.

Figure 4.10 Dealing with sextiles.

Figure 4.11 Classification rules.

Chapter 05

Figure 5.1 Outcome of a hypothesis test.

Figure 5.2 Example of a significance test.

Figure 5.3 Target data.

Figure 5.4 Stratified sampling.

Figure 5.5 Example of linked histograms.

Figure 5.6 Example of cumulative view.

Figure 5.7 Example of bar charts.

Figure 5.8 Overview of correlation procedures.

Figure 5.9 Example of 2 × 2 contingency table.

Figure 5.10 Example of a 6 × 2 contingency table.

Figure 5.11 Example of a 2 × 2 low frequency table.

Figure 5.12 Example of Bootstrap Forest I.

Figure 5.13 Example of Bootstrap Forest II.

Chapter 06

Figure 6.1 Linear regression model and choice of target variable.

Figure 6.2 Choice of the analysis method and explanatory variables.

Figure 6.3 Target variables being checked.

Figure 6.4 Reduction of the number of input variables using the stepwise approach.

Figure 6.5 Residual plots for the model.

Figure 6.6 Detailed results for a stepwise linear regression Part 1.

Figure 6.7 Detailed results for a stepwise linear regression – Final model.

Figure 6.8 Non-linear regression.

Figure 6.9 Non-linear regression steps Part 1.

Figure 6.10 Non-linear regression steps Part 2.

Figure 6.11 Non-linear regression steps Part 3.

Figure 6.12 Non-linear regression steps Part 4.

Figure 6.13 Non-linear regression steps Part 5.

Figure 6.14 Non-linear regression steps Part 6 results.

Figure 6.15 Non-linear regression steps Part 7.

Figure 6.16 Non-linear regression options to measure the model quality.

Figure 6.17 Non-linear regression steps Part 8.

Figure 6.18 Non-linear regression steps Part 9.

Figure 6.19 Decision tree.

Figure 6.20 Decision tree with leaf and knots.

Figure 6.21 Decision tree – Pruning.

Figure 6.22 Artificial neural network.

Figure 6.23 Multivariate model expressed as a neural net.

Figure 6.24 Neural network model.

Figure 6.25 Learning in an artificial neural network through backward propagation.

Figure 6.26 Basic realisation of clustering Part 1.

Figure 6.27 Basic realisation of clustering Part 2.

Figure 6.28 Realisation of clustering K-Means.

Figure 6.29 Kohonen network with two-dimensional arrangement of the output neurons.

Figure 6.30 SOM/Kohonen network realisation Part 1.

Figure 6.31 SOM/Kohonen network realisation Part 2.

Figure 6.32 SOM/Kohonen network realisation Part 3.

Figure 6.33 Association rules.

Figure 6.34 Example data.

Figure 6.35 Results representation (market basket analysis).

Chapter 07

Figure 7.1 Gain chart to compare models.

Figure 7.2 Typical lift chart (on full population).

Figure 7.3 Example for a model with one unstable area.

A Practical Guide to Data Mining for Business and Industry

Andrea Ahlemeyer-Stubbe

Director Strategic Analytics, DRAFTFCB München GmbH, Germany

Shirley Coleman

Principal Statistician, Industrial Statistics Research UnitSchool of Maths and Statistics, Newcastle University, UK

This edition first published 2014© 2014 John Wiley & Sons, Ltd

Registered OfficeJohn Wiley & Sons, Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom

For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com.

The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.

Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. It is sold on the understanding that the publisher is not engaged in rendering professional services and neither the publisher nor the author shall be liable for damages arising herefrom. If professional advice or other expert assistance is required, the services of a competent professional should be sought.

Library of Congress Cataloging-in-Publication Data

Ahlemeyer-Stubbe, Andrea.A practical guide to data mining for business and industry / Andrea Ahlemeyer-Stubbe, Shirley Coleman.pages cmIncludes bibliographical references and index.ISBN 978-1-119-97713-1 (cloth)

1. Data mining. 2. Marketing–Data processing. 3. Management–Mathematical models.I. Title.HF5415.125.A42 2014006.3′12–dc23

2013047218

A catalogue record for this book is available from the British Library.

ISBN: 978-1-119-97713-1

Glossary of terms

Accuracy

| A measurement of the match (degree of closeness) between predictions and real values.

Address

| A unique identifier for a computer or site online, usually a URL for a website or marked with an @ for an email address. Literally, it is how your computer finds a location on the information highway.

Advertising

| Paid form of a non-personal communication by industry, business firms, non-profit organisations or individuals delivered through the various media. Advertising is persuasive and informational and is designed to influence the purchasing behaviour and thought patterns of the audience. Advertising may be used in combination with sales promotions, personal selling tactics or publicity. This also includes promotion of a product, service or message by an identified sponsor using paid-for media.

Aggregation

| Form of segmentation that assumes most consumers are alike.

Algorithm

| The process a search engine applies to web pages so it can accurately produce a list of results based on a search term. Search engines regularly change their algorithms to improve the quality of the search results. Hence, search engine optimisation tends to require constant research and monitoring.

Analytics

| A feature that allows you to understand (learn more) a wide range of activity related to your website, your online marketing activities and direct marketing activities. Using analytics provides you with information to help optimise your campaigns, ad groups and keywords, as well as your other online marketing activities, to best meet your business goals.

API

| Application Programming Interface, often used to exchange data, for example, with social networks.

Attention

| A momentary attraction to a stimulus, something someone senses via sight, sound, touch, smell or taste. Attention is the starting point of the perceptual process in that attention of a stimulus will either cause someone to decide to make sense of it or reject it.

B2B

| Business To Business – Business conducted between companies rather than between a company and individual consumers. For example, a firm that makes parts that are sold directly to an automobile manufacturer.

B2C

| Business To Consumer – Business conducted between companies and individual consumers rather than between two companies. A retailer such as Tesco or the greengrocer next door is an example of a B2C company.

Banner

| Banners are the 468-by-60 pixels ad space on commercial websites that are usually ‘hotlinked’ to the advertiser’s site.

Banner ad

| Form of Internet promotion featuring information or special offers for products and services. These small space ‘banners’ are interactive: when clicked, they open another website where a sale can be finalized. The hosting website of the banner ad often earns money each time someone clicks on the banner ad.

Base period

| Period of time applicable to the learning data.

Behavioural targeting