32,99 €
The essential guide for data scientists and for leaders who must get more from their data science teams The Economist boldly claims that data are now "the world's most valuable resource." But, as Kenett and Redman so richly describe, unlocking that value requires far more than technical excellence. The Real Work of Data Science explores understanding the problems, dealing with quality issues, building trust with decision makers, putting data science teams in the right organizational spots, and helping companies become data-driven. This is the work that spells the difference between a good data scientist and a great one, between a team that makes marginal contributions and one that drives the business, between a company that gains some value from its data and one in which data truly is "the most valuable resource." "These two authors are world-class experts on analytics, data management, and data quality; they've forgotten more about these topics than most of us will ever know. Their book is pragmatic, understandable, and focused on what really counts. If you want to do data science in any capacity, you need to read it." --Thomas H. Davenport, Distinguished Professor, Babson College and Fellow, MIT Initiative on the Digital Economy "I like your book. The chapters address problems that have faced statisticians for generations, updated to reflect today's issues, such as computational Big Data." --Sir David Cox, Warden of Nuffield College and Professor of Statistics, Oxford University "Data science is critical for competitiveness, for good government, for correct decisions. But what is data science? Kenett and Redman give, by far, the best introduction to the subject I have seen anywhere. They address the critical questions of formulating the right problem, collecting the right data, doing the right analyses, making the right decisions, and measuring the actual impact of the decisions. This book should become required reading in statistics and computer science departments, business schools, analytics institutes and, most importantly, by all business managers." --A. Blanton Godfrey, Joseph D. Moore Distinguished University Professor, Wilson College of Textiles, North Carolina State University
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 221
Veröffentlichungsjahr: 2019
Cover
About the Authors
Preface
Fad, Trend, or Fundamental Transformation?
Data Scientists and Chief Analytics Officers
Introduction to the Book
About the Companion Website
1 A Higher Calling
The Life‐Cycle View
The Organizational Ecosystem
Once Again, Our Goal
2 The Difference Between a Good Data Scientist and a Great One
Implications
3 Learn the Business
The Annual Report
SWOTs and Strategic Analysis
The Balanced Scorecard and Key Performance Indicators
The Data Lens
Build Your Network
Implications
4 Understand the Real Problem
A Telling Example
Understanding the Real Problem
Implications
5 Get Out There
Understand Context and Soft Data
Identify Sources of Variability
Selective Attention
Memory Bias
Implications
6 Sorry, but You Can't Trust the Data
Most Data Is Untrustworthy
Dealing with Immediate Issues
Getting in Front of Tomorrow's Data Quality Issues
Implications
7 Make It Easy for People to Understand Your Insights
First, Get the Basics Right
Presentations Get Passed Around
The Best of the Best
Implications
8 When the Data Leaves Off and Your Intuition Takes Over
Modes of Generalization
Implications
9 Take Accountability for Results
Practical Statistical Efficiency
Using Data Science to Perform Impact Analysis
Implications
10 What It Means to Be “Data‐driven”
Data‐driven Companies and People
Traits of the Data‐driven
Traits of the Antis
Implications
11 Root Out Bias in Decision‐making
Understand Why It Occurs
Take Control on a Personal Level
Solid Scientific Footings
Implications
12 Teach, Teach, Teach
The Rope Exercise
The “Roll Your Own” Exercise
The Starter Kit of Questions to Ask Data Scientists
Implications
13 Evaluating Data Science Outputs More Formally
Assessing Information Quality
A Hands‐On Information Quality Workshop
Implications
14 Educating Senior Leaders
Covering the Waterfront
Companies Need a Data and Data Science Strategy
Organizations Are “Unfit for Data”
Get Started with Data Quality
Implications
15 Putting Data Science, and Data Scientists, in the Right Spots
The Need for Senior Leadership
Building a Network of Data Scientists
Implications
16 Moving Up the Analytics Maturity Ladder
Implications
17 The Industrial Revolutions and Data Science
The First Industrial Revolution: From Craft to Repetitive Activity
The Second Industrial Revolution: The Advent of the Factory
The Third Industrial Revolution: Enter the Computer
The Fourth Industrial Revolution: The Industry 4.0 Transformation
Implications
18 Epilogue
Strong Foundations
A Bridge to the Future
Appendix A: Skills of a Data Scientist
Appendix B: Data Defined
Appendix C: Questions to Help Evaluate the Outputs of Data Science
Appendix D: Ethical Considerations and Today's Data Scientist
Appendix E: Recent Technical Advances in Data Science
References
A List of Useful Links
Videos, Blogs, and Presentations
Articles
Index
End User License Agreement
Chapter 1
Figure 1.1 The life‐cycle view of data analytics, in the context of the organiz...
Figure 1.2 The number of ice creams sold in a Danish locality, by day in July.
Chapter 6
Figure 6.1 Process for evaluating data's trustworthiness. DQ: data quality.
Figure 6.2 JCI data validation guidelines.
Chapter 7
Figure 7.1 The plot of data quality results, as first presented (second‐year av...
Figure 7.2 The earlier data quality plot, fully explained. Note the addition of...
Chapter 8
Figure 8.1 Modes of generalization.
Chapter 10
Figure 10.1 All decisions are made in the face of uncertainty. The spirit of “d...
Chapter 11
Figure 11.1 Muller–Lyer optical illusion with and without frame.
Chapter 12
Figure 12.1 Series of steps in taking participants through the rope exercise.
Figure 12.2 Tom's original plot of his meeting start time data.
Chapter 14
Figure 14.1 Data in the context of a car: what it takes to manufacture and sell...
Chapter 15
Figure 15.1 The best “home” for data scientists, organizationally, depends on t...
Chapter 16
Figure 16.1 An example of a control chart tracking individual measurements (upp...
Figure 16.2 The analytics maturity ladder.
Cover
Table of Contents
Begin Reading
i
ii
iii
vii
viii
ix
xv
xvii
xviii
xix
xxi
1
2
3
4
5
6
7
9
10
11
12
13
14
15
16
17
18
19
21
22
23
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
49
50
51
52
53
55
56
57
58
59
60
61
63
64
65
66
67
68
69
70
71
73
74
75
76
77
78
79
80
81
83
84
85
86
87
88
89
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
These two authors are world‐class experts on analytics, data management, and data quality; they’ve forgotten more about these topics than most of us will ever know. Their book is pragmatic, understandable, and focused on what really counts. If you want to do data science in any capacity, you need to read it.
Thomas H. DavenportDistinguished Professor, Babson College and Fellow, MIT Initiative on the Digital Economy
I like your book. The Chapters address problems that have faced Statisticians for generations, updated to reflect today’s issues, such as computational big data.
Sir David CoxWarden of Nuffield College and Professor of Statistics, Oxford University
I am already in love with your book based on the overview and preface!! What a creative approach! Speaks a lot to your ability to tell a good story – one of the key ways of reasoning for a good data scientist!
Hollylynne S. LeeProfessor, Mathematics and Statistics Education and Faculty Fellow, Friday Institute for Educational Innovation, North Carolina State University
The root causes of business failures typically are management, not technology. In today’s complex and changing digital world, the advice in The Real Work of Data Science is essential. Read it and do it.
John A. ZachmanChairman – Zachman International and Executive Director – FEAC Institute
If you are wondering what the real challenges and solutions to solving your ‘Big Data’ problem are, this is a must read book. Ron and Tom move past the technology hype and highlight the real issues and opportunities in leveraging data science to the benefit of your organization
Jeff MacMillanChief Analytics and Data Officer, Morgan Stanley Wealth Management
Much needed!
Neil LawrenceProfessor of Machine Learning at the University of Sheffield and Machine Learning team manager at Amazon
More than 80% of data science projects fail, either partially or wholly, at the implementation stage. There is a wealth of books on the technical and mechanical aspects of data science, but little to guide data scientists and managers on the holistic integration of data science into organizations in a way that produces success. This well‐written book fills that gap.
Peter BruceFounder and Chief Academic Officer, The Institute for Statistics Education
C’est livre est très intéressant et plein de très bonnes choses intelligentes et utiles. Il sera sans nul doute très précieux.
Jean Michel PoggiProfessor of Statistics at Paris‐Descartes University and Mathematics Laboratory, Orsay University, Paris, France,
Past President of the Société Française de Statistique and Vice‐President of the Federation of European National Statistical Societies
I like the very direct and succinct style. You are certainly right on target when you say you can’t stress enough the importance of understanding the real problem. Other of your points in Chapter 1 really hit home, such as data scientists spending more time on data quality than on analysis. (I’m glad they do.) Further, you are absolutely correct that data scientists must translate their results into the language of the decision‐maker. I also recognize the liberal use of anecdotes in the book. For instance, the remarks about Bill Hunter, the ice cream sales, the Pokémon experiment, etc. I personally like this, and I do this in all of my speeches since I think it really hooks the audience.
Barry NussbaumPast Chief Statistician, the United States Environmental Protection Agency and Past President of the American Statistical Association
I think this book is excellent for an introductory course in data science. It could be used with students at university level or with professionals in specialist courses.
Luciana Dalla ValleLecturer in Statistics and Programme Manager of the MSc Data Science and Business Analytics, School of Computing, Electronics and Mathematics, Plymouth University, UK
The Real Work of Data Science addresses the softer issues of data science that actually decide on the success or failure of any data science initiative. It makes the data science and Chief Analytics Officer roles more understandable and accessible to a wider audience. Choosing the right modeling method is often the key point of discussion in books, although it is just a tiny fraction of the job to be done. This book prepares you for the harsh reality of data science in the real‐world.
Alexander BorekGlobal Head of Data & Analytics at Volkswagen Financial Services
Data science is critical for competitiveness, for good government, for correct decisions. But what is data science? Kenett and Redman give, by far, the best introduction to the subject I have seen anywhere. They address the critical questions of formulating the right problem, collecting the right data, doing the right analyses, making the right decisions, and measuring the actual impact of the decisions. This book should become required reading in statistics and computer science departments, business schools, analytics institutes and, most importantly, by all business managers.
A. Blanton Godfrey, Joseph D. MooreDistinguished University Professor, Wilson College of Textiles, North Carolina State University
TURNING DATA INTO INFORMATION, BETTER DECISIONS, AND STRONGER ORGANIZATIONS
Ron S. Kenett
Ra'anana, Israel
Thomas C. Redman
Rumson, NJ, USA
This edition first published 2019© 2019 Ron S. Kenett and Thomas C. Redman
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.
The right of Ron S. Kenett and Thomas C. Redman to be identified as the authors of this work has been asserted in accordance with law.
Registered OfficesJohn Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USAJohn Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK
Editorial Office9600 Garsington Road, Oxford, OX4 2DQ, UK
For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.
Wiley also publishes its books in a variety of electronic formats and by print‐on‐demand. Some content that appears in standard print versions of this book may not be available in other formats.
Limit of Liability/Disclaimer of WarrantyWhile the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
Library of Congress Cataloging‐in‐Publication data has been applied for
ISBN: 9781119570707
Cover Design: WileyCover Image: © enisaksoy/Getty Images
To Sima, our children and their families, and theirwonderful children: Yonatan, Alma, Tomer, Yadin, Aviv,Gili, Matan, Eden and Ethan
Ron
To my wife Nancy, our six children, and our grandchildren
Tom
Prof. Ron S. Kenett is Chairman of the KPA Group and Senior Research Fellow at the Samuel Neaman Institute, Technion, Haifa, Israel. He is an applied statistician combining expertise in academic, consulting, and business domains. Ron is past president of the Israel Statistical Association and the European Network for Business and Industrial Statistics. He has written more than 250 papers and 14 books on statistical methods and applications. He was awarded the 2013 Greenfield Medal by the English Royal Statistical Society and the 2018 Box Medal by the European Network for Business and Industrial Statistics in recognition of excellence in contributions to the development and application of statistics.
Dr. Thomas C. Redman, “the Data Doc,” President of Data Quality Solutions, helps start‐ups, multinationals, senior executives, chief data officers, and leaders buried deep in their organizations chart their courses to data‐driven futures, with special emphasis on quality and data science. The author of five other books and hundreds of papers, Tom's most important article is “Data's Credibility Problem” (Harvard Business Review, December 2013). He has a PhD in statistics and two patents. Tom lives in Rumson, New Jersey, with his wife, Nancy.
This book has its roots in a chance meeting brought on when Ron responded to an article on data science that Tom published. One short discussion led to another, quickly narrowing to a common theme: we shared the experience that, in order to help companies and organizations become better at exploiting data and statistical analysis, one needs something more than technical brilliance. For both of us, our most successful and impactful projects resulted from other factors, such as understanding the problem, narrowing the focus, delivering simple messages in powerful ways, being in the right spot at the right time, and building the trust of decision‐makers. Conversely, our failures stemmed not from poor technical work but from a failure to connect, on the right issues, with the right people, or in the right way.
We had both written, separately, on some aspects of these topics. Ron has studied how one generates information quality with a framework labeled “InfoQ,” Tom has addressed data quality and became known as “the Data Doc.” We wondered if we could help data scientists who work in companies and other organizations enjoy more and larger successes and endure fewer failures by putting our heads together.
It is no secret that “data,” broadly defined, is all the rage. And “data science,” including traditional statistics, Bayesian statistics, business intelligence, predictive analytics, big data, machine learning, and artificial intelligence (AI) are enjoying the spotlight. There are plenty of great successes, building on a rich tradition of statistics in government and industry, driven by increasing business needs, more data powered by social media, the Internet of Things, and the computer power to analyze it. Iconic new companies include Amazon, Facebook, Google, and Uber. At the same time, there are enormous issues: the Facebook/Cambridge Analytica scandals of early 2018 underscore threats to our privacy (Kenett et al. 2018), many fear that millions of jobs will be lost to artificial intelligence, analytics projects still fail at a high rate, and the tremendous damage that has resulted from some notable “successful” efforts, as described in O'Neil (2016).
Will data and data science power the next great economic miracle? Will they make solid contributions, more positive than negative? Or will they be just another fad confined to the scrap heap of failed ideas? Even worse, will they put our entire social fabric at risk? It is impossible to know.
We do know that data and data science can be truly transformative, improving customer satisfaction, increasing profits, and empowering people – we have seen it with our own eyes. We believe that data scientists have huge roles to play in tipping the scales toward the good in the questions above. This will require incredible commitment, determination, and follow‐through. We encourage data scientists, statisticians, and those who manage them to take up the cause, as we have. We want to do all we can to fully equip them.
In writing the book, we adopted four “personas” as readers. First is Sally, a 31‐year‐old data scientist who works in a midsize department or company. Sally's job involves producing management reports, although she does have some time for teasing insights from ever‐increasing volumes of untrustworthy data. Her title could be any of “data scientist,” “statistician,” “analyst,” “machine learning specialist,” and others. We are well aware that some people see differences between these titles. But (with one exception, below) those distinctions are meaningless for us. Whether you are trained as a statistician, computer scientist, physicist, or engineer, your job is to turn “data into information and better decisions,” as part of our title demands.
Our second reader persona is Divesh, the 50‐year‐old who has the top analytics job within his department, business unit, or company. His title may be “chief analytics officer,” “head of data science,” or something similar. Divesh may have no formal training in data science, but he is a seasoned manager. While Divesh's day job is to manage data science across his department, within his sphere, he also bears special responsibility for the “building stronger organizations” portion of our title.
Brian, a solid industrial statistician, aged 46 and employed as an internal consultant, is our third persona. Brian is simultaneously bemused and threatened by data science, and he sits on the sidelines way too much. We think Brian has much to offer and encourage him to join the effort.
A fourth persona has an outsized impact on data science and this book. It is Elizabeth, who heads up some department, division, even an entire company. Liz hated statistics in college – it was a required course, poorly taught, and not connected to the rest of her studies. She has seen more and more power in data and data science over the last several years and is just beginning to explore what it means for her department. Liz is both excited about the possibilities and fearful that her efforts will fail miserably.
More than anything, Liz's success, or failure, will dictate the future of data science. She can ignore it (and there are plenty of good reasons to do so) or become an increasingly demanding customer. If she fully embraces data and data science, she can transform her department.
Sally, Divesh, and Brian have different needs but share a common theme. Their business is to turn numbers into information and insights. To be useful, their analyses need to guide decisions that carry a positive impact in the workplace. In other words, they need to help Liz succeed.
We packaged our experience in 18 short chapters directly relevant to our four main personas. We do not deal with technical issues but instead focus on the make or break ingredients in data‐driven transformation.
The chapters cover the different steps data scientists take in organizations. We discuss their role as individuals and through their organizational positions. We present lots of models that have helped us, we discuss the integration of hard and soft data in analytic work, and we stress the importance of impact (as opposed to technical excellence). The book also provides a context and opens curtains to landscapes that are not usually explored by most experts in data analysis.
We build on the contributions of statisticians like Box, Breiman, Cox, Deming, Hahn, and Tukey; cognitive psychologists like Kahneman and Tversky; and leaders in other disciplines to address current and future challenges. We also connect theory and applications, past contributions and modern developments, organizational needs and the means to fulfill them.
We've been as direct and to the point as we are able. This book should help you think more broadly about your job. Those seeking cookbook style “how‐tos” will be sadly disappointed. It does provide an overview, benchmarks, and objectives, but you will have to develop your own concrete action plans.
We will be successful if readers take ideas introduced here and apply them in ways that best suit their own skill sets, the needs of decision‐makers they serve, and the cultures of their organizations. Data and analytics can transform organizations for the good – we encourage data scientists and applied statisticians to do their part, to help decision‐makers become more effective, and to keep this transformation on the right track.
This book is accompanied by a companion website:
www.wiley.com/go/kenett‐redman/datascience
The website material includes:
A List of Useful Links
Scan this QR code to visit the companion website.
It is a great time for data science! The Economist proudly proclaims that data is “the world’s most valuable resource,”1 and Hal Varian and Tom Davenport2 have variously called statistics and data science “the sexiest job of the twentieth century.” In searching the web for the term data scientist, we find the following definition, “‘Data Scientist’ means a professional who uses scientific methods to liberate and create meaning from raw data.”3 Similar definitions have been offered for statisticians and data analysts.4 Yet we believe the work is more involved and requires skills far beyond those needed to create meaning from raw data.
This book expands and clarifies what it takes to succeed in this job, within the organizational ecosystem in which it takes place. It builds on years of experience in a wide range of organizations, all over the world. Our goal is to share this experience and some retrospective insights learned in doing real work. Specifically, we propose that the real work of data scientists and statisticians involves helping people make better decisions on the important issues in the near term and building stronger organizations and capabilities in the long term. By “people” we mean, among others, managers in organizations and professionals in service and production industries. This perspective is also relevant to educators in schools and colleges and researchers in laboratories and academic institutions. It is a far higher, and more demanding, calling. For example, you don't get to contribute on the really important decisions unless you're trusted.
Thus, the real work requires total involvement: helping to formulate the problems and opportunities in crisp business or scientific terms; understanding which data to consider and the strengths and limitations in the data; determining when new data is needed; dealing with quality issues; using the data to reduce uncertainty; making clear where the data ends and intuition must take over; presenting results in simple, powerful ways; recognizing that all important decisions involve political realities; working with others; and supporting decisions in practice. This real work is not taught enough in statistics or data science courses.
The unpleasant reality is that many/most companies derive only a fraction of the value that their data, data science, and statistics offer (see, for example, Henke et al. 2016). Data scientists and their managers, including chief analytics officers (CAOs), chief data scientists, heads of data science, and other professionals who employ data scientists,5 must learn how to address the barriers that get in the way. Thus, the real work also involves raising everyone's ability to conduct simple analyses and understand more complex ones, understand the power of data, understand variation, and integrate data and their intuitions; putting the right data scientists and statisticians in the right spots; educating senior leadership on the power of data; helping them become good consumers of data science; teaching them their roles in advancing the effort; and creating the organizational structures needed to do all of the above effectively and (reasonably) efficiently. This is what this book is about.
Providing the added value we are talking about requires a wide perspective. Figure 1.1 presents the life cycle of data analytics in the context of an organization aiming to profit from data science (adapted from Kenett 2015). As the figure illustrates, the work is highly iterative (for more on this process, see Box 1997).
Figure 1.1 The life‐cycle view of data analytics, in the context of the organizational ecosystem in which the work takes place.
The life‐cycle view is designed to help data scientists help decision‐makers. Let's consider each step of the cycle in turn.
Observe what happens when you go to a dentist: you give a dentist a hint about your symptoms, you are placed in the chair, the dentist looks into your mouth, diagnoses and (hopefully) solves the problem, and tells you when to come back, all in less than an hour.
The seasoned data scientist knows better. We describe these data scientists in Chapter 2
