E-Book
62,99 €

Visual Data Mining E-Book

Russell K. Anderson

0,0

62,99 €

oder

Leseprobe lesen

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.

Herausgeber: John Wiley & Sons
Kategorie: Wissenschaft und neue Technologien
Sprache: Englisch

Beschreibung

A visual approach to data mining.

Data mining has been defined as the search for useful and previously unknown patterns in large datasets, yet when faced with the task of mining a large dataset, it is not always obvious where to start and how to proceed.

This book introduces a visual methodology for data mining demonstrating the application of methodology along with a sequence of exercises using VisMiner. VisMiner has been developed by the author and provides a powerful visual data mining tool enabling the reader to see the data that they are working on and to visually evaluate the models created from the data.

Key features:

Presents visual support for all phases of data mining including dataset preparation.
Provides a comprehensive set of non-trivial datasets and problems with accompanying software.
Features 3-D visualizations of multi-dimensional datasets.
Gives support for spatial data analysis with GIS like features.
Describes data mining algorithms with guidance on when and how to use.
Accompanied by VisMiner, a visual software tool for data mining, developed specifically to bridge the gap between theory and practice.

Visual Data Mining: The VisMiner Approach is designed as a hands-on work book to introduce the methodologies to students in data mining, advanced statistics, and business intelligence courses. This book provides a set of tutorials, exercises, and case studies that support students in learning data mining processes.

In praise of the VisMiner approach:

"What we discovered among students was that the visualization concepts and tools brought the analysis alive in a way that was broadly understood and could be used to make sound decisions with greater certainty about the outcomes"
—Dr. James V. Hansen, J. Owen Cherrington Professor, Marriott School, Brigham Young University, USA

"Students learn best when they are able to visualize relationships between data and results during the data mining process. VisMiner is easy to learn and yet offers great visualization capabilities throughout the data mining process. My students liked it very much and so did I."
—Dr. Douglas Dean, Assoc. Professor of Information Systems, Marriott School, Brigham Young University, USA

Details

Sie lesen das E-Book in den Legimi-Apps auf:

Android

iOS

von Legimi
zertifizierten E-Readern

Seitenzahl: 264

Veröffentlichungsjahr: 2012

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Leseprobe

Contents

Cover

Title Page

Preface

Acknowledgments

Chapter 1: Introduction

Data Mining Objectives

Introduction to VisMiner

The Data Mining Process

Summary

Chapter 2: Initial Data Exploration and Dataset Preparation Using VisMiner

The Rationale for Visualizations

Tutorial – Using VisMiner

Summary

Chapter 3: Advanced Topics in Initial Exploration and Dataset Preparation Using VisMiner

Missing Values

Summary

Chapter 4: Prediction Algorithms for Data Mining

Decision Trees

Artificial Neural Networks

Support Vector Machines

Summary

Chapter 5: Classification Models in VisMiner

Dataset Preparation

Tutorial – Building and Evaluating Classification Models

Model Evaluation

Prediction Likelihoods

Classification Model Performance

Interpreting the ROC Curve

Classification Ensembles

Model Application

Summary

Chapter 6: Regression Analysis

The Regression Model

Correlation and Causation

Algorithms for Regression Analysis

Assessing Regression Model Performance

Model Validity

Looking Beyond R2

Polynomial Regression

Artificial Neural Networks for Regression Analysis

Dataset Preparation

Tutorial

A Regression Model for Home Appraisal

Modeling with the Right Set of Observations

ANN Modeling

The Advantage of ANN Regression

Top-Down Attribute Selection

Issues in Model Interpretation

Model Validation

Model Application

Summary

Chapter 7: Cluster Analysis

Introduction

Algorithms for Cluster Analysis

Issues with K-Means Clustering Process

Hierarchical Clustering

Measures of Cluster and Clustering Quality

Silhouette Coefficient

Correlation Coefficient

Self-Organizing Maps (SOM)

Self-Organizing Maps in VisMiner

Choosing the Grid Dimensions

Advantages of a 3-D Grid

Extracting Subsets from a Clustering

Summary

Appendix A: VisMiner Reference by Task

Dataset Preparation

Data Exploration

Model Building – Algorithm Application

Model Evaluation

Appendix B: VisMiner Task/Tool Matrix

Appendix C: IP Address Look-up

IP Address for VisSlave When Using One Computer

IP Address for VisSlave When Using Multiple Computers

Index

This edition first published 2013

Registered office

John Wiley & Sons, Ltd., The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom

For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com.

The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.

Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought.

Library of Congress Cataloging-in-Publication Data

Anderson, Russell K.

Visual data mining : the VisMiner approach / Russell K. Anderson.

p. cm.

Includes index.

ISBN 978-1-119-96754-5 (cloth)

1. Data mining. 2. Information visualization. 3. VisMiner (Electronic resource) I. Title.

QA76.9.D343A347 2012

006.3′12–dc23

2012018033

A catalogue record for this book is available from the British Library.

ISBN: 9781119967545

Preface

VisMiner was designed to be used as a data mining teaching tool with application in the classroom. It visually supports the complete data mining process – from dataset preparation, preliminary exploration, and algorithm application to model evaluation and application. Students learn best when they are able to visualize the relationships between data attributes and the results of a data mining algorithm application.

This book was originally created to be used as a supplement to the regular textbook of a data mining course in the Marriott School of Management at Brigham Young University. Its primary objective was to assist students in learning VisMiner, allowing them to visually explore and model the primary text datasets and to provide additional practice datasets and case studies. In doing so, it supported a complete step-by-step process for data mining.

In later revisions, additions were made to the book introducing data mining algorithm overviews. These overviews included the basic approach of the algorithm, strengths and weaknesses, and guidelines for application. Consequently, this book can be used both as a standalone text in courses providing an application-level introduction to data mining, and as a supplement in courses where there is a greater focus on algorithm details. In either case, the text coupled with VisMiner will provide visualization, algorithm application, and model evaluation capabilities for increased data mining process comprehension.

As stated above, VisMiner was designed to be used as a teaching tool for the classroom. It will effectively use all display real estate available. Although the complete VisMiner system will operate within a single display, in the classroom setting we recommend a dual display/projector setting. From experience, we have also found that students using VisMiner also prefer the dual display setup. In chatting with students about their experience with VisMiner, we found that they would bring their laptop to class, working off a single display, then plug in a second display while solving problems at home.

An accompanying website where VisMiner, datasets, and additional problems may be downloaded is available at www.wiley.com/go/visminer.

Acknowledgments

The author would like to thank the faculty and students of the Marriott School of Management at Brigham Young University. It was their testing of the VisMiner software and feedback for drafts of this book that has brought it to fruition. In particular, Dr. Jim Hansen and Dr. Douglas Dean have made extraordinary efforts to incorporate both the software and the drafts in their data mining courses over the past three years.

In developing and refining VisMiner, Daniel Link, now a PhD student at the University of Southern California, made significant contributions to the visualization components. Dr. Musa Jafar, West Texas A&M University provided valuable feedback and suggestions.

Finally, thanks go to Charmaine Anderson and Ryan Anderson who provided editorial support during the initial draft preparation.

Introduction

Data mining has been defined as the search for useful and previously unknown patterns in large datasets. Yet when faced with the task of mining a large dataset, it is not always obvious where to start and how to proceed. The purpose of this book is to introduce a methodology for data mining and to guide you in the application of that methodology using software specifically designed to support the methodology. In this chapter, we provide an overview of the methodology. The chapters that follow add detail to that methodology and contain a sequence of exercises that guide you in its application. The exercises use VisMiner, a powerful visual data mining tool which was designed around the methodology.

Data Mining Objectives

Normally in data mining a mathematical model is constructed for the purpose of prediction or description. A model can be thought of as a virtual box that accepts a set of inputs, then uses that input to generate output.

Prediction modeling algorithms use selected input attributes and a single selected output attribute from your dataset to build a model. The model, once built, is used to predict an output value based on input attribute values. The dataset used to build the model is assumed to contain historical data from past events in which the values of both the input and output attributes are known. The data mining methodology uses those values to construct a model that best fits the data. The process of model construction is sometimes referred to as training. The primary objective of model construction is to use the model for predictions in the future using known input attribute values when the value of the output attribute is not yet known. Prediction models that have a categorical output are known as models. For example, an insurance company may want to build a classification model to predict if an insurance claim is likely to be fraudulent or legitimate.

Lesen Sie weiter in der vollständigen Ausgabe!

Tausende von E-Books und Hörbücher

Ihre Zahl wächst ständig und Sie haben eine Fixpreisgarantie.

Sie haben über uns geschrieben: