Machine Learning for Business Analytics - Galit Shmueli - E-Book

Machine Learning for Business Analytics E-Book

Galit Shmueli

0,0
107,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Machine Learning for Business Analytics

Machine learning—also known as data mining or data analytics—is a fundamental part of data science. It is used by organizations in a wide variety of arenas to turn raw data into actionable information.

Machine Learning for Business Analytics: Concepts, Techniques and Applications in RapidMiner provides a comprehensive introduction and an overview of this methodology. This best-selling textbook covers both statistical and machine learning algorithms for prediction, classification, visualization, dimension reduction, rule mining, recommendations, clustering, text mining, experimentation and network analytics. Along with hands-on exercises and real-life case studies, it also discusses managerial and ethical issues for responsible use of machine learning techniques.

This is the seventh edition of Machine Learning for Business Analytics, and the first using RapidMiner software. This edition also includes:

  • A new co-author, Amit Deokar, who brings experience teaching business analytics courses using RapidMiner
  • Integrated use of RapidMiner, an open-source machine learning platform that has become commercially popular in recent years
  • An expanded chapter focused on discussion of deep learning techniques
  • A new chapter on experimental feedback techniques including A/B testing, uplift modeling, and reinforcement learning
  • A new chapter on responsible data science
  • Updates and new material based on feedback from instructors teaching MBA, Masters in Business Analytics and related programs, undergraduate, diploma and executive courses, and from their students
  • A full chapter devoted to relevant case studies with more than a dozen cases demonstrating applications for the machine learning techniques
  • End-of-chapter exercises that help readers gauge and expand their comprehension and competency of the material presented
  • A companion website with more than two dozen data sets, and instructor materials including exercise solutions, slides, and case solutions

This textbook is an ideal resource for upper-level undergraduate and graduate level courses in data science, predictive analytics, and business analytics. It is also an excellent reference for analysts, researchers, and data science practitioners working with quantitative data in management, finance, marketing, operations management, information systems, computer science, and information technology.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 1055

Veröffentlichungsjahr: 2023

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



MACHINE LEARNING FOR BUSINESS ANALYTICS

Concepts, Techniques and Applications in RapidMiner

 

 

GALIT SHMUELINational Tsing Hua University

PETER C. BRUCEstatistics.com

AMIT V. DEOKARUniversity of Massachusetts Lowell

NITIN R. PATELCytel, Inc.

 

This edition first published 2023© 2023 John Wiley & Sons, Inc.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.

The right of Galit Shmueli, Peter C. Bruce, Amit V. Deokar, and Nitin R. Patel to be identified as the authors of this work has been asserted in accordance with law.

Registered OfficeJohn Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA

For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.

Wiley also publishes its books in a variety of electronic formats and by print-on-demand. Some content that appears in standard print versions of this book may not be available in other formats.

Trademarks: Wiley and the Wiley logo are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its affiliates in the United States and other countries and may not be used without written permission. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book.

Limit of Liability/Disclaimer of WarrantyIn view of ongoing research, equipment modifications, changes in governmental regulations, and the constant flow of information relating to the use of experimental reagents, equipment, and devices, the reader is urged to review and evaluate the information provided in the package insert or instructions for each chemical, piece of equipment, reagent, or device for, among other things, any changes in the instructions or indication of usage and for added warnings and precautions. While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

Library of Congress Cataloging‐in‐Publication Data Applied for:

Hardback: 9781119828792

Cover Design: WileyCover Image: © Eakarat Buanoi/Getty Images

 

 

 

To our families

Boaz and Noa

Liz, Lisa, and Allison

Aparna, Aditi, Anuja, Ajit, Aai, and Baba

Tehmi, Arjun, and in memory of Aneesh

Foreword by Ravi Bapna

Converting data into an asset is the new business imperative facing modern managers. Each day the gap between what analytics capabilities make possible and companies’ absorptive capacity of creating value from such capabilities increases. In many ways, data is the new gold—and mining this gold to create business value in today's context of a highly networked and digital society requires a skillset that we haven't traditionally delivered in business or statistics or engineering programs on their own. For those businesses and organizations that feel overwhelmed by today's big data, the phrase you ain't seen nothing yet comes to mind. Yesterday's three major sources of big data—the 20+ years of investment in enterprise systems (ERP, CRM, SCM, etc.), the 3 billion plus people on the online social grid, and the close to 5 billion people carrying increasingly sophisticated mobile devices—are going to be dwarfed by tomorrow's smarter physical ecosystems fueled by the Internet of Things (IoT) movement.

The idea that we can use sensors to connect physical objects such as homes, automobiles, roads, and even garbage bins and streetlights to digitally optimized systems of governance goes hand in glove with bigger data and the need for deeper analytical capabilities. We are not far away from a smart refrigerator sensing that you are short on, say, eggs, populating your grocery store's mobile app's shopping list, and arranging a Task Rabbit to do a grocery run for you. Or the refrigerator negotiating a deal with an Uber driver to deliver an evening meal to you. Nor are we far away from sensors embedded in roads and vehicles that can compute traffic congestion, track roadway wear and tear, record vehicle use, and factor these into dynamic usage‐based pricing, insurance rates, and even taxation. This brave new world is going to be fueled by analytics and the ability to harness data for competitive advantage.

Business Analytics is an emerging discipline that is going to help us ride this new wave. This new Business Analytics discipline requires individuals who are grounded in the fundamentals of business such that they know the right questions to ask; who have the ability to harness, store, and optimally process vast datasets from a variety of structured and unstructured sources; and who can then use an array of techniques from machine learning and statistics to uncover new insights for decision‐making. Such individuals are a rare commodity today, but their creation has been the focus of this book for a decade now. This book's forte is that it relies on explaining the core set of concepts required for today's business analytics professionals using real‐world data‐rich cases in a hands‐on manner, without sacrificing academic rigor. It provides a modern‐day foundation for Business Analytics, the notion of linking the x's to the y's of interest in a predictive sense. I say this with the confidence of someone who was probably the first adopter of the zeroth edition of this book (Spring 2006 at the Indian School of Business).

After the publication of the R and Python editions, the new RapidMiner edition is an important addition. RapidMiner is gaining in popularity among analytics professionals as it is a non‐programming environment that lowers the barriers for managers to adopt analytics. The new addition also covers causal analytics as experimentation (often called A/B testing in the industry), which is now becoming mainstream in the tech companies. Further, the authors have added a new chapter on Responsible Data Science, a new part on AutoML, more on deep learning and beefed up deep learning examples in the text mining and forecasting chapters. These updates make this new edition “state of the art” with respect to modern business analytics and AI.

I look forward to using the book in multiple fora, in executive education, in MBA classrooms, in MS‐Business Analytics programs, and in Data Science bootcamps. I trust you will too!

 

RAVI BAPNA

Carlson School of Management, University of Minnesota, 2022

Preface to the RapidMiner Edition

This textbook first appeared in early 2007 and has been used by numerous students and practitioners and in many courses, including our own experience teaching this material both online and in person for more than 15 years. The first edition, based on the Excel add‐in Analytic Solver Data Mining (previously XLMiner), was followed by two more Analytic Solver editions, a JMP edition, an R edition, a Python edition, and now this RapidMiner edition, with its companion website, www.dataminingbook.com.

This new RapidMiner edition relies on the open source machine learning platform RapidMiner Studio (generally called RapidMiner), which offers both free and commercially‐licensed versions. We present output from RapidMiner, as well as the processes used to produce that output. We show the specification of the appropriate operators from RapidMiner and some of its key extensions. Unlike computer science‐ or statistics‐oriented textbooks, the focus in this book is on machine learning concepts and how to implement the associated algorithms in RapidMiner. We assume a familiarity with the fundamentals of data analysis and statistics. Basic knowledge of Python can be helpful for a few chapters.

For this RapidMiner edition, a new co‐author, Amit Deokar comes on board bringing both expertise teaching business analytics courses using RapidMiner and extensive data science experience in working with businesses on research and consulting projects. In addition to providing RapidMiner guidance, and RapidMiner process and output screenshots, this edition also incorporates updates and new material based on feedback from instructors teaching MBA, MS, undergraduate, diploma, and executive courses, and from their students.

Importantly, this edition includes several new topics:

A dedicated section on

deep learning

in

Chapter 11

, with additional deep learning examples in text mining (

Chapter 21

) and time series forecasting (

Chapter 19

).

A new chapter on

Responsible Data Science

(

Chapter 22

) covering topics of fairness, transparency, model cards and datasheets, legal considerations, and more, with an illustrative example.

The

Performance Evaluation

exposition in

Chapter 5

was expanded to include further metrics (precision and recall, F1, AUC) and the SMOTE oversampling method.

A new chapter on

Generating, Comparing, and Combining Multiple Models

(

Chapter 13

) that covers AutoML, explaining model predictions, and ensembles.

A new chapter dedicated to

Interventions and User Feedback

(

Chapter 14

), that covers A/B tests, uplift modeling, and reinforcement learning.

A new case (Loan Approval) that touches on regulatory and ethical issues.

A note about the book's title: The first two editions of the book used the title Data Mining for Business Intelligence. Business intelligence today refers mainly to reporting and data visualization (“what is happening now”), while business analytics has taken over the “advanced analytics,” which include predictive analytics and data mining. Later editions were therefore renamed Data Mining for Business Analytics. However, the recent AI transformation has made the term machine learning more popularly associated with the methods in this textbook. In this new edition, we therefore use the updated terms Machine Learning and Business Analytics.

Since the appearance of the (Analytic Solver‐based) second edition, the landscape of the courses using the textbook has greatly expanded: whereas initially the book was used mainly in semester‐long elective MBA‐level courses, it is now used in a variety of courses in business analytics degrees and certificate programs, ranging from undergraduate programs to postgraduate and executive education programs. Courses in such programs also vary in their duration and coverage. In many cases, this textbook is used across multiple courses. The book is designed to continue supporting the general “predictive analytics”, “data mining”, or “machine learning” course as well as supporting a set of courses in dedicated business analytics programs.

A general “business analytics,” “predictive analytics,” or “machine learning” course, common in MBA and undergraduate programs as a one‐semester elective, would cover Parts I–III, and choose a subset of methods from Parts IV and V. Instructors can choose to use cases as team assignments, class discussions, or projects. For a two‐semester course, Part VII might be considered, and we recommend introducing Part VIII (Data Analytics).

For a set of courses in a dedicated business analytics program, here are a few courses that have been using our book:

Predictive Analytics—Supervised Learning

: In a dedicated business analytics program, the topic of predictive analytics is typically instructed across a set of courses. The first course would cover

Parts I

III

, and instructors typically choose a subset of methods from

Part IV

according to the course length. We recommend including

Part VIII

: Data Analytics.

Predictive Analytics—Unsupervised Learning

: This course introduces data exploration and visualization, dimension reduction, mining relationships, and clustering (Parts II and VI). If this course follows the Predictive Analytics: Supervised Learning course, then it is useful to examine examples and approaches that integrate unsupervised and supervised learning, such as

Part VIII

on Data Analytics.

Forecasting analytics

: A dedicated course on time series forecasting would rely on

Part VI

.

Advanced analytics

: A course that integrates the learnings from predictive analytics (supervised and unsupervised learning) can focus on

Part VIII

: Data Analytics, where social network analytics and text mining are introduced, and responsible data science is discussed. Such a course might also include

Chapter 13

, Generating, Comparing, and Combining Multiple Models and AutoML from

Part IV

, as well as

Part V

, which covers experiments, uplift modeling, and reinforcement learning. Some instructors choose to use the cases (

Chapter 23

) in such a course.

In all courses, we strongly recommend including a project component, where data are either collected by students according to their interest or provided by the instructor (e.g., from the many machine learning competition datasets available). From our experience and other instructors’ experience, such projects enhance the learning and provide students with an excellent opportunity to understand the strengths of machine learning and the challenges that arise in the process.

GALIT SHMUELI, PETER C. BRUCE, AMIT V. DEOKAR, AND NITIN R. PATEL2022

Acknowledgments

We thank the many people who assisted us in improving the book from its inception as Data Mining for Business Intelligence in 2006 (using XLMiner, now Analytic Solver), its reincarnation as Data Mining for Business Analytics, and now Machine Learning for Business Analytics, including translations in Chinese and Korean and versions supporting Analytic Solver Data Mining, R, Python, SAS JMP, and now RapidMiner.

Anthony Babinec, who has been using earlier editions of this book for years in his data mining courses at Statistics.com, provided us with detailed and expert corrections. Dan Toy and John Elder IV greeted our project with early enthusiasm and provided detailed and useful comments on initial drafts. Ravi Bapna, who used an early draft in a data mining course at the Indian School of Business and later at University of Minnesota, has provided invaluable comments and helpful suggestions since the book's start.

Many of the instructors, teaching assistants, and students using earlier editions of the book have contributed invaluable feedback both directly and indirectly, through fruitful discussions, learning journeys, and interesting machine learning projects that have helped shape and improve the book. These include MBA students from the University of Maryland, MIT, the Indian School of Business, National Tsing Hua University, University of Massachusetts Lowell, and Statistics.com. Instructors from many universities and teaching programs, too numerous to list, have supported and helped improve the book since its inception. Scott Nestler has been a helpful friend of this book project from the beginning.

Kuber Deokar, instructional operations supervisor at Statistics.com, has been unstinting in his assistance, support, and detailed attention. We also thank Anuja Kulkarni, assistant teacher at Statistics.com. Valerie Troiano has shepherded many instructors and students through the Statistics.com courses that have helped nurture the development of these books.

Colleagues and family members have been providing ongoing feedback and assistance with this book project. Vijay Kamble at UIC and Travis Greene at NTHU have provided valuable help with the section on reinforcement learning. Boaz Shmueli and Raquelle Azran gave detailed editorial comments and suggestions on the first two editions; Bruce McCullough and Adam Hughes did the same for the first edition. Noa Shmueli provided careful proofs of the third edition. Ran Shenberger offered design tips. Che Lin and Boaz Shmueli provided feedback on deep learning. Ken Strasma, founder of the microtargeting firm HaystaqDNA and director of targeting for the 2004 Kerry campaign and the 2008 Obama campaign, provided the scenario and data for the section on uplift modeling. We also thank Jen Golbeck, director of the Social Intelligence Lab at the University of Maryland and author of Analyzing the Social Web, whose book inspired our presentation in the chapter on social network analytics. Inbal Yahav and Peter Gedeck, co‐authors of the R and Python editions, helped improve the social network analytics and text mining chapters. Randall Pruim contributed extensively to the chapter on visualization. Inbal Yahav, co‐author of the R edition, helped improve the social network analytics and text mining chapters.

Marietta Tretter at Texas A&M shared comments and thoughts on the time series chapters, and Stephen Few and Ben Shneiderman provided feedback and suggestions on the data visualization chapter and overall design tips.

Susan Palocsay and Mia Stephens have provided suggestions and feedback on numerous occasions, as have Margret Bjarnadottir and Mohammad Salehan. We also thank Catherine Plaisant at the University of Maryland's Human–Computer Interaction Lab, who helped out in a major way by contributing exercises and illustrations to the data visualization chapter. Gregory Piatetsky‐Shapiro, founder of KDNuggets.com, has been generous with his time and counsel in the early years of this project.

We thank colleagues at the MIT Sloan School of Management for their support during the formative stage of this book—Dimitris Bertsimas, James Orlin, Robert Freund, Roy Welsch, Gordon Kaufmann, and Gabriel Bitran. As teaching assistants for the data mining course at Sloan, Adam Mersereau gave detailed comments on the notes and cases that were the genesis of this book, Romy Shioda helped with the preparation of several cases and exercises used here, and Mahesh Kumar helped with the material on clustering.