79,99 €
- A comprehensive overview of the various fields of application of data science and artificial intelligence. - Case studies from practice to make the described concepts tangible. - Practical examples to help you carry out simple data analysis projects. - BONUS in print edition: E-Book inside Data Science, Big Data, Artificial Intelligence and Generative AI are currently some of the most talked-about concepts in industry, government, and society, and yet also the most misunderstood. This book will clarify these concepts and provide you with practical knowledge to apply them. Using exercises and real-world examples, it will show you how to apply data science methods, build data platforms, and deploy data- and ML-driven projects to production. It will help you understand - and explain to various stakeholders - how to generate value from such endeavors. Along the way, it will bring essential data science concepts to life, including statistics, mathematics, and machine learning fundamentals, and explore crucial topics like critical thinking, legal and ethical considerations, and building high-performing data teams. Readers of all levels of data familiarity - from aspiring data scientists to expert engineers to data leaders - will ultimately learn: how can an organization become more data-driven, what challenges might it face, and how can they as individuals help make that journey a success. The team of authors consists of data professionals from business and academia, including data scientists, engineers, business leaders and legal experts. All are members of the Vienna Data Science Group (VDSG), an NGO that aims to establish a platform for exchanging knowledge on the application of data science, AI and machine learning, and raising awareness of the opportunities and potential risks of these technologies. WHAT‘S INSIDE // - Critical Thinking and Data Culture: How evidence driven decision making is the base for effective AI. - Machine Learning Fundamentals: Foundations of mathematics, statistics, and ML algorithms and architectures - Natural Language Processing and Computer Vision: How to extract valuable insights from text, images and video data, for real world applications. - Foundation Models and Generative AI: Understand the strengths and challenges of generative models for text, images, video, and more. - ML and AI in Production: Turning experimentation into a working data science product. - Presenting your Results: Essential presentation techniques for data scientists.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 1632
Veröffentlichungsjahr: 2024
Katherine Munro, Stefan Papp, Zoltan Toth, Wolfgang Weidinger, Danko Nikolić, Barbora Antosova Vesela, Karin Bruckmüller, Annalisa Cadonna, Jana Eder, Jeannette Gorzala, Gerald Hahn, Georg Langs, Roxane Licandro, Christian Mata, Sean McIntyre, Mario Meir-Huber, György Móra, Manuel Pasieka, Victoria Rugli, Rania Wazir, Günther Zauner
The Handbook of Data Science and AI
Generate Value from Data with Machine Learning and Data Analytics
2nd Edition
Distributed by:Carl Hanser VerlagPostfach 86 04 20, 81631 Munich, GermanyFax: +49 (89) 98 48 09www.hanserpublications.comwww.hanser-fachbuch.de
The use of general descriptive names, trademarks, etc., in this publication, even if the former are not especially identified, is not to be taken as a sign that such names, as understood by the Trade Marks and Merchandise Marks Act, may accordingly be used freely by anyone. While the advice and information in this book are believed to be true and accurate at the date of going to press, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein.
The final determination of the suitability of any information for the use contemplated for a given application remains the sole responsibility of the user.
All rights reserved. No part of this book may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying or by any information storage and retrieval system, without permission in writing from the publisher.No part of the work may be used for the purposes of text and data mining without the written consent of the publisher, in accordance with § 44b UrhG (German Copyright Law).
© Carl Hanser Verlag, Munich 2024Coverconcept: Marc Müller-Bremer, www.rebranding.de, MunichCoverdesign: Tom WestCover image: © istockphoto.com/ValeryBrozhinskyEditor: Sylvia HasselbachProduction Management: le-tex publishing services GmbH, LeipzigTypesetting: Eberl & Koesel Studio, Kempten, Germany
Print ISBN: 978-1-56990-934-8E-Book ISBN: 978-1-56990-235-6ePub ISBN: 978-1-56990-411-4
Title
Copyright
Table of Contents
Preface
Acknowledgments
1 Introduction
Stefan Papp
1.1 About this Book
1.2 The Halford Group
1.2.1 Alice Halford – Chairwoman
1.2.2 Analysts
1.2.3 “CDO”
1.2.4 Sales
1.2.5 IT
1.2.6 Security
1.2.7 Production Leader
1.2.8 Customer Service
1.2.9 HR
1.2.10 CEO
1.3 In a Nutshell
2 The Alpha and Omega of AI
Stefan Papp
2.1 The Data Use Cases
2.1.1 Bias
2.1.2 Data Literacy
2.2 Culture Shock
2.3 Ideation
2.4 Design Process Models
2.4.1 Design Thinking
2.4.2 Double Diamond
2.4.3 Conducting Workshops
2.5 In a Nutshell
3 Cloud Services
Stefan Papp
3.1 Introduction
3.2 Cloud Essentials
3.2.1 XaaS
3.2.2 Cloud Providers
3.2.3 Native Cloud Services
3.2.4 Cloud-native Paradigms
3.3 Infrastructure as a Service
3.3.1 Hardware
3.3.2 Distributed Systems
3.3.3 Linux Essentials for Data Professionals
3.3.4 Infrastructure as Code
3.4 Platform as a Service
3.4.1 Cloud Native PaaS Solutions
3.4.2 External Solutions
3.5 Software as a Service
3.6 In a Nutshell
4 Data Architecture
Zoltan C. Toth and Sean McIntyre
4.1 Overview
4.1.1 Maslow’s Hierarchy of Needs for Data
4.1.2 Data Architecture Requirements
4.1.3 The Structure of a Typical Data Architecture
4.1.4 ETL (Extract, Transform, Load)
4.1.5 ELT (Extract, Load, Transform)
4.1.6 ETLT
4.2 Data Ingestion and Integration
4.2.1 Data Sources
4.2.2 Traditional File Formats
4.2.3 Modern File Formats
4.2.4 Which Storage Option to Choose?
4.3 Data Warehouses, Data Lakes, and Lakehouses
4.3.1 Data Warehouses
4.3.2 Data Lakes and Cloud Data Platforms
4.4 Data Transformation
4.4.1 SQL
4.4.2 Big Data & Apache Spark
4.4.3 Cloud Data Platforms for Apache Spark
4.5 Workflow Orchestration
4.5.1 Dagster and the Modern Data Stack
4.6 A Data Architecture Use Case
4.7 In a Nutshell
5 Data Engineering
Stefan Papp
5.1 Differentiating from Software Engineering
5.2 Programming Languages
5.2.1 Code or No Code?
5.2.2 Language Ecosystem
5.2.3 Python
5.2.4 Scala
5.3 Software Engineering Processes for Data
5.3.1 Configuration Management
5.3.2 CI/CD
5.4 Data Pipelines
5.4.1 Common Characteristics of a Data Pipeline
5.4.2 Data Pipelines in the Unified Data Architecture
5.5 Storage Options
5.5.1 File Era
5.5.2 Database Era
5.5.3 Data Lake Era
5.5.4 Serverless Era
5.5.5 Polyglot Storage
5.5.6 Data Mesh Era
5.6 Tooling
5.6.1 Batch: Airflow
5.6.2 Streaming: Kafka
5.6.3 Transformation: Databricks Notebooks
5.7 Common challenges
5.7.1 Data Quality and Different Standards
5.7.2 Skewed Data
5.7.3 Stressed Operational Systems
5.7.4 Legacy Operational Systems
5.7.5 Platform and Information Security
5.8 In a Nutshell
6 Data Governance
Victoria Rugli, Mario Meir-Huber
6.1 Why Do We Need Data Governance?
6.1.1 Sample 1: Achieving Clarity with Data Governance
6.1.2 Sample 2: The (Negative) Impact of Poor Data Governance
6.2 The Building Blocks of Data Governance
6.2.1 Data Governance Explained
6.3 People
6.3.1 Data Ownership
6.3.2 Data Stewardship
6.3.3 Data Governance Board
6.3.4 Change Management
6.4 Process
6.4.1 Metadata Management
6.4.2 Data Quality Management
6.4.3 Data Security and Privacy
6.4.4 Master Data Management
6.4.5 Data Access and Search
6.5 Technology (Data Governance Tools)
6.5.1 Open-Source Tools
6.5.2 Cloud-based Data Governance Tools
6.6 In a Nutshell
7 Machine Learning Operations (ML Ops)
Zoltan C. Toth, György Móra
7.1 Overview
7.1.1 Scope of MLOps
7.1.2 Data Collection and Exploration
7.1.3 Feature Engineering
7.1.4 Model Training
7.1.5 Models Deployed to Production
7.1.6 Model Evaluation
7.1.7 Model Understanding
7.1.8 Model Versioning
7.1.9 Model Monitoring
7.2 MLOps in an Organization
7.2.1 Main Benefits of MLOps
7.2.2 Capabilities Needed for MLOps
7.3 Several Common Scenarios in the MLOps Space
7.3.1 Integrating Notebooks
7.3.2 Features in Production
7.3.3 Model Deployment
7.3.4 Model Formats
7.4 MLOps Tooling and MLflow
7.4.1 MLflow
7.5 In a Nutshell
8 Machine Learning Security
Manuel Pasieka
8.1 Introduction to Cybersecurity
8.2 Attack Surface
8.3 Attack Methods
8.3.1 Model Stealing
8.3.2 Data Extraction
8.3.3 Data Poisoning
8.3.4 Adversarial Attack
8.3.5 Backdoor Attack
8.4 Machine Learning Security of Large Language Models
8.4.1 Data Extraction
8.4.2 Jailbreaking
8.4.3 Prompt Injection
8.5 AI Threat Modelling
8.6 Regulations
8.7 Where to go from here
8.8 Conclusion
8.9 In a Nutshell
9 Mathematics
Annalisa Cadonna
9.1 Linear Algebra
9.1.1 Vectors and Matrices
9.1.2 Operations between Vectors and Matrices
9.1.3 Linear Transformations
9.1.4 Eigenvalues, Eigenvectors, and Eigendecomposition
9.1.5 Other Matrix Decompositions
9.2 Calculus and Optimization
9.2.1 Derivatives
9.2.2 Gradient and Hessian
9.2.3 Gradient Descent
9.2.4 Constrained Optimization
9.3 Probability Theory
9.3.1 Discrete and Continuous Random Variables
9.3.2 Expected Value, Variance, and Covariance
9.3.3 Independence, Conditional Distributions, and Bayes’ Theorem
9.4 In a Nutshell
10 Statistics – Basics
Rania Wazir, Georg Langs, Annalisa Cadonna
10.1 Data
10.2 Simple Linear Regression
10.3 Multiple Linear Regression
10.4 Logistic Regression
10.5 How Good is Our Model?
10.6 In a Nutshell
11 Business Intelligence (BI)
Christian Mata
11.1 Introduction to Business Intelligence
11.1.1 Definition of Business Intelligence
11.1.2 Role in Organizations
11.1.3 Development of Business Intelligence
11.1.4 Data Science and AI in the Context of BI
11.1.5 Data for Decision-Making
11.1.6 Understanding Business Context
11.1.7 Business Intelligence Activities
11.2 Data Management Fundamentals
11.2.1 What is Data Management, Data Integration and Data Warehousing?
11.2.2 Data Load Processes – The Case of ETL or ELT
11.2.3 Data Modeling
11.3 Reporting and Data Analysis
11.3.1 Reporting
11.3.2 Types of Reports
11.3.3 Data Analysis
11.3.4 Visual Analysis
11.3.5 Significant Trends
11.3.6 Relevant BI Technologies
11.3.7 BI Tool Examples
11.4 BI and Data Science: Complementary Disciplines
11.4.1 Differences
11.4.2 Similarities
11.4.3 Interdependencies
11.5 Outlook for Business Intelligence
11.5.1 Expectations for the Evolution of BI
11.6 In a Nutshell
12 Machine Learning
Georg Langs, Katherine Munro, Rania Wazir
12.1 Introduction
12.2 Basics: Feature Spaces
12.3 Classification Models
12.3.1 K-Nearest-Neighbor-Classifier
12.3.2 Support Vector Machine
12.3.3 Decision Trees
12.4 Ensemble Methods
12.4.1 Bias and Variance
12.4.2 Bagging: Random Forests
12.4.3 Boosting: AdaBoost
12.4.4 The Limitations of Feature Construction and Selection
12.5 Unsupervised learning: Learning without labels
12.5.1 Clustering
12.5.2 Manifold Learning
12.5.3 Generative Models
12.6 Artificial Neural Networks and Deep Learning
12.6.1 The Perceptron
12.6.2 Artificial Neural Networks
12.6.3 Deep Learning
12.6.4 Convolutional Neural Networks
12.6.5 Training Convolutional Neural Networks
12.6.6 Recurrent Neural Networks
12.6.7 Long Short-Term Memory
12.6.8 Autoencoders and U-Nets
12.6.9 Adversarial Training Approaches
12.6.10 Generative Adversarial Networks
12.6.11 Cycle GANs and Style GANs
12.7 Transformers and Attention Mechanisms
12.7.1 The Transformer Architecture
12.7.2 What the Attention Mechanism Accomplishes
12.7.3 Applications of Transformer Models
12.8 Reinforcement Learning
12.9 Other Architectures and Learning Strategies
12.10 Validation Strategies for Machine Learning Techniques
12.11 Conclusion
12.12 In a Nutshell
13 Building Great Artificial Intelligence
Danko Nikolić
13.1 How AI Relates to Data Science and Machine Learning
13.2 A Brief History of AI
13.3 Five Recommendations for Designing an AI Solution
13.3.1 Recommendation No. 1: Be Pragmatic
13.3.2 Recommendation No. 2: Make it Easier for Machines to Learn – Create Inductive Biases
13.3.3 Recommendation No. 3: Perform Analytics
13.3.4 Recommendation No. 4: Beware of the Scaling Trap
13.3.5 Recommendation No. 5: Beware of the Generality Trap (there is no such a thing as free lunch)
13.4 Human-level Intelligence
13.5 In a Nutshell
14 Signal Processing
Jana Eder
14.1 Introduction
14.2 Sampling and Quantization
14.3 Frequency Domain Analysis
14.3.1 Fourier Transform
14.4 Noise Reduction and Filtering Techniques
14.4.1 Denoising Using a Gaussian Low-pass Filter
14.5 Time Domain Analysis
14.5.1 Signal Normalization and Standardization
14.5.2 Signal Transformation and Feature Extraction
14.5.3 Time Series Decomposition Techniques
14.5.4 Autocorrelation: Understanding Signal Similarity over Time
14.6 Time-Frequency Domain Analysis
14.6.1 Short Term Fourier Transform and Spectrogram
14.6.2 Discrete Wavelet Transform
14.6.3 Gramian Angular Field
14.7 The Relationship of Signal Processing and Machine Learning
14.7.1 Techniques for Feature Engineering
14.7.2 Preparing for Machine Learning
14.8 Practical Applications
14.9 In a Nutshell
15 Foundation Models
Danko Nikolić
15.1 The Idea of a Foundation Model
15.2 How to Train a Foundation Model?
15.3 How Do we Use Foundation Models?
15.4 A Breakthrough: There is no End to Learning
15.5 In a Nutshell
16 Generative AI and Large Language Models
Katherine Munro, Gerald Hahn, Danko Nikolić
16.1 Introduction to “Gen AI”
16.2 Generative AI Modalities
16.2.1 Methods for Training Generative Models
16.3 Large Language Models
16.3.1 What are “LLMs”?
16.3.2 How is Something like ChatGPT Trained?
16.3.3 Methods for Using LLMs Directly
16.3.4 Methods for Customizing an LLM
16.4 Vulnerabilities and Limitations of Gen AI Models
16.4.1 Introduction
16.4.2 Prompt Injection and Jailbreaking Attacks
16.4.3 Hallucinations, Confabulations, and Reasoning Errors
16.4.4 Copyright Concerns
16.4.5 Bias
16.5 Building Robust, Effective Gen AI Applications
16.5.1 Control Strategies Throughout Development and Use
16.5.2 Guardrails
16.5.3 Using Generative AI Safely and Successfully
16.6 In a Nutshell
17 Natural Language Processing (NLP)
Katherine Munro
17.1 What is NLP and Why is it so Valuable?
17.2 Why Learn “Traditional” NLP in the “Age of Large Language Models”?
17.3 NLP Data Preparation Techniques
17.3.1 The NLP Pipeline
17.3.2 Converting the Input Format for Machine Learning
17.4 NLP Tasks and Methods
17.4.1 Rule-Based (Symbolic) NLP
17.4.2 Statistical Machine Learning Approaches
17.4.3 Neural NLP
17.4.4 Transfer Learning
17.5 In a Nutshell
18 Computer Vision
Roxane Licandro
18.1 What is Computer Vision?
18.2 A Picture Paints a Thousand Words
18.2.1 The Human Eye
18.2.2 Image Acquisition Principle
18.2.3 Digital File Formats
18.2.4 Image Compression
18.3 I Spy With My Little Eye Something That Is
18.3.1 Computational Photography and Image Manipulation
18.4 Computer Vision Applications & Future Directions
18.4.1 Image Retrieval Systems
18.4.2 Object Detection, Classification and Tracking
18.4.3 Medical Computer Vision
18.5 Making Humans See
18.6 In a Nutshell
19 Modelling and Simulation – Create your own Models
Günther Zauner, Wolfgang Weidinger, Dominik Brunmeir, Benedikt Spiegel
19.1 Introduction
19.2 General Considerations during Modeling
19.3 Modelling to Answer Questions
19.4 Reproducibility and Model Lifecycle
19.4.1 The Lifecycle of a Modelling and Simulation Question
19.4.2 Parameter and Output Definition
19.4.3 Documentation
19.4.4 Verification and Validation
19.5 Methods
19.5.1 Ordinary Differential Equations (ODEs)
19.5.2 System Dynamics (SD)
19.5.3 Discrete Event Simulation
19.5.4 Agent-based Modelling
19.6 Modelling and Simulation Examples
19.6.1 Dynamic Modelling of Railway Networks for Optimal Pathfinding Using Agent-based Methods and Reinforcement Learning
19.6.2 Agent-Based Covid Modelling Strategies
19.6.3 Deep Reinforcement Learning Approach for Optimal Replenishment Policy in a VMI Setting
19.6.4 Finding Feasible Solutions for a Resource-constrained Project Scheduling Problem with Reinforcement Learning and Implementing a Dynamic Planing Scheme with Discrete Event Simulation
19.7 Summary and Lessons Learned
19.8 In a Nutshell
20 Data Visualization
Barbora Antosova Vesela
20.1 History
20.2 Which Tools to Use
20.3 Types of Data Visualizations
20.3.1 Scatter Plot
20.3.2 Line Chart
20.3.3 Column and Bar Charts
20.3.4 Histogram
20.3.5 Pie Chart
20.3.6 Box Plot
20.3.7 Heat Map
20.3.8 Tree Diagram
20.3.9 Other Types of Visualizations
20.4 Select the right Data Visualization
20.5 Tips and Tricks
20.6 Presentation of Data Visualization
20.7 In a Nutshell
21 Data Driven Enterprises
Mario Meir-Huber, Stefan Papp
21.1 The three Levels of a Data Driven Enterprise
21.2 Culture
21.2.1 Corporate Strategy for Data
21.2.2 The Current State Analysis
21.2.3 Culture and Organization of a Successful Data Organisation
21.2.4 Core Problem: The Skills Gap
21.3 Technology
21.3.1 The Impact of Open Source
21.3.2 Cloud
21.3.3 Vendor Selection
21.3.4 Data Lake from a Business Perspective
21.3.5 The Role of IT
21.3.6 Data Science Labs
21.3.7 Revolution in Architecture: The Data Mesh
21.4 Business
21.4.1 Buy and Share Data
21.4.2 Analytical Use Case Implementation
21.4.3 Self-service Analytics
21.5 In a Nutshell
22 Creating High-Performing Teams
Stefan Papp
22.1 Forming
22.2 Storming
22.2.1 Scenario: 50 Shades of Red
22.2.2 Scenario: Retrospective
22.3 Norming
22.3.1 Change Management and Transition
22.3.2 RACI Matrix
22.3.3 SMART
22.3.4 Agile Processes
22.3.5 Communication Culture
22.3.6 DataOps
22.4 Performing
22.4.1 Scenario: A new Dawn
22.4.2 Growth Mindsets
22.5 In a Nutshell
23 Artificial Intelligence Act
Jeannette Gorzala, Karin Bruckmüller
23.1 Introduction
23.2 Definition of AI Systems
23.3 Scope and Purpose of the AI Act
23.3.1 The Risk-Based Approach
23.3.2 Unacceptable Risk and Prohibited AI Practices
23.3.3 High-Risk AI Systems and Compliance
23.3.4 Medium Risk and Transparency Obligations
23.3.5 Minimal Risk and Voluntary Commitments
23.4 General Purpose AI Models
23.5 Timeline and Applicability
23.6 Penalties
23.7 AI and Civil Liability
23.8 AI and Criminal Liability
23.9 In a Nutshell
24 AI in Different Industries
Stefan Papp, Mario Meir-Huber, Wolfgang Weidinger, Thomas Treml
24.1 Automotive
24.1.1 Vision
24.1.2 Data
24.1.3 Use Cases
24.1.4 Challenges
24.2 Aviation
24.2.1 Vision
24.2.2 Data
24.2.3 Use Cases
24.2.4 Challenges
24.3 Energy
24.3.1 Vision
24.3.2 Data
24.3.3 Use Cases
24.3.4 Challenges
24.4 Finance
24.4.1 Vision
24.4.2 Data
24.4.3 Use Cases
24.4.4 Challenges
24.5 Health
24.5.1 Vision
24.5.2 Data
24.5.3 Use Cases
24.5.4 Challenges
24.6 Government
24.6.1 Vision
24.6.2 Data
24.6.3 Use Cases
24.6.4 Challenges
24.7 Art
24.7.1 Vision
24.7.2 Data
24.7.3 Use cases
24.7.4 Challenges
24.8 Manufacturing
24.8.1 Vision
24.8.2 Data
24.8.3 Use Cases
24.8.4 Challenges
24.9 Oil and Gas
24.9.1 Vision
24.9.2 Data
24.9.3 Use Cases
24.9.4 Challenges
24.10 Retail
24.10.1 Vision
24.10.2 Data
24.10.3 Use Cases
24.10.4 Challenges
24.11 Telecommunications Provider
24.11.1 Vision
24.11.2 Data
24.11.3 Use Cases
24.11.4 Challenges
24.12 Transport
24.12.1 Vision
24.12.2 Data
24.12.3 Use Cases
24.12.4 Challenges
24.13 Teaching and Training
24.13.1 Vision
24.13.2 Data
24.13.3 Use Cases
24.13.4 Challenges
24.14 The Digital Society
24.15 In a Nutshell
25 Climate Change and AI
Stefan Papp
25.1 Introduction
25.2 AI – a Climate Saver?
25.3 Measuring and Reducing Emissions
25.3.1 Baseline
25.3.2 Data Use Cases
25.4 Sequestration
25.4.1 Biological Sequestration
25.4.2 Geological Sequestration
25.5 Prepare for Impact
25.6 Geoengineering
25.7 Greenwashing
25.8 Outlook
25.9 In a Nutshell
26 Mindset and Community
Stefan Papp
26.1 Data-Driven Mindset
26.2 Data Science Culture
26.2.1 Start-up or Consulting Firm?
26.2.2 Labs Instead of Corporate Policy
26.2.3 Keiretsu Instead of Lone Wolf
26.2.4 Agile Software Development
26.2.5 Company and Work Culture
26.3 Antipatterns
26.3.1 Devaluation of Domain Expertise
26.3.2 IT Will Take Care of It
26.3.3 Resistance to Change
26.3.4 Know-it-all Mentality
26.3.5 Doom and Gloom
26.3.6 Penny-pinching
26.3.7 Fear Culture
26.3.8 Control over Resources
26.3.9 Blind Faith in Resources
26.3.10 The Swiss Army Knife
26.3.11 Over-Engineering
26.4 In a Nutshell
27 Trustworthy AI
Rania Wazir
27.1 Legal and Soft-Law Framework
27.1.1 Standards
27.1.2 Regulations
27.2 AI Stakeholders
27.3 Fairness in AI
27.3.1 Bias
27.3.2 Fairness Metrics
27.3.3 Mitigating Unwanted Bias in AI Systems
27.4 Transparency of AI Systems
27.4.1 Documenting the Data
27.4.2 Documenting the Model
27.4.3 Explainability
27.5 Conclusion
27.6 In a Nutshell
28 Epilogue
Stefan Papp
28.1 Halford 2.0
28.1.1 Environmental, Social and Governance
28.1.2 HR
28.1.3 Customer Satisfaction
28.1.4 Production
28.1.5 IT
28.1.6 Strategy
28.2 Final Words
28.3 In a Nutshell
29 The Authors
This preface was NOT written by ChatGPT (or similar).
As I make this statement, I’m wondering how often it will remain true for text or even other forms of media in the future. Over the last two years, this AI-powered tool has risen to enormous popularity, and has given Data Science and AI an incredible awareness boost. As a result, the expectations for Artificial Intelligence have grown seemingly exponentially, and reached such heights that one might ask, if they can ever be achieved.
AI is following the well-known hype cycle. Some of these high expectations are well-deserved: this powerful technology will change the way we live and work in many ways. To name one example: some universities are considering not to ask their students for seminar papers any longer, as it’s not possible to check if it was written by an AI tool.
But we also must brace ourselves for some disappointment in the future, as AI inevitably fails to live up to certain people’s inflated expectations.
Even when the vision is reasonable, often the timelines these people and organizations have in mind for implementing AI projects is not. This leads to further disappointment, when the hoped-for impact and value fail to materialize within the desired timeframe.
We’re already seeing the beginning of this, with ChatGPT and similar tools generating plenty of eloquent and coherent – yet completely inaccurate – information. This isn’t helped by the new wave of ‘AI experts’, who are making ever more outlandish promises about tools invented by themselves or their companies; promises which will be very hard to keep. They are, essentially, selling digital ‘snake oil’.
All of this puts even more pressure on data scientists to deal with these expectations, while continuing to deliver on the same goal they’ve had for decades:
generating understandable answers to questions, using data.
This is what makes neutral organizations such as the Vienna Data Science Group (VDSG [www.vdsg.at]) – which fosters interdisciplinary and international knowledge exchange between data experts – so necessary and important. We are still highly dedicated to the development of the entire Data Science and AI ecosystem (education, certification, standardization, societal impact study, and so on), across Europe and beyond. This book represents just one of our efforts towards this goal. Because despite all the hype and hyperbole in the AI and data landscape, Data Science remains the same: an interdisciplinary science gathering a very heterogeneous crowd of specialists. It is made up of three major streams, and we are proud to have expert members in each of them:
Computer Science and IT
Mathematics and Statistics
Domain expertise in the industry or field in which Data Science and AI is applied.
As a matter of fact, the VDSG [www.vdsg.at] has always taken a holistic approach to data science, and this book is no different: Starting at Chapter 1 we introduce a fictional company who wants to become more data driven, and we check in with them throughout the book, right up to the end of their data transformation in Chapter 28. Along the way we cover many challenges in their journey, thus providing you with practical insights which were only possible thanks to vibrant exchange among our vast Data Science and AI community.
The result is a greatly expanded edition of our Data Science & AI Handbook, with 10 new chapters covering topics like Building AI solutions (Chapter 13), Foundation Models (Chapter 15), Large Language Models and Generative AI (Chapter 16) and Climate Change and AI (Chapter 25). This is complemented by also tackling the fundamental topics of Data Architecture, Engineering and Governance (Chapters 4, 5 and 6) and topping it off with Machine Learning Operations (MLOps, Chapter 7), which has become a very important discipline in itself.
To provide a firm foundation to help you understand all this, we’ve again included an introduction to the underlying Mathematics (Chapter 9) and Statistics (Chapter 10) used in Data Science, as well as chapters on the theory behind Machine Learning, Signal Processing and Computer Vision (Chapters 12, 14 and 18). We’ve also covered topics related to generating value from data, such as Business Intelligence (Chapter 11) and Data Driven Enterprises (Chapter 21), as well as vital information to help you use data safely, including chapters on the new EU AI Act (Chapter 23) and Trustworthy AI (Chapter 27).
This vast expansion of VDSG’s Magnum Opus serves one core purpose:
to give a realistic and holistic picture of Data Science and AI.
Data Science and AI is developing at an incredibly quick pace at the moment and so is its impact on society. This means that responsibilities put on the shoulders of data scientists have grown as well, and so has the need for organizations like VDSG [www.vdsg.at] to get involved and tackle these challenges too.
Let’s go for it!
Summer 2024
Wolfgang Weidinger
We, the authors, would like to take this opportunity to express our sincere gratitude to our families and friends, who helped us to express our thoughts and insights in this book. Without their support and patience, this work would not have been possible.
A special thanks from all the authors goes to Katherine Munro, who contributed a lot to this book and spent a tremendous amount of time and effort editing our manuscripts.
For my parents, who always said I could do anything. We never expected it would be a thing like this.
Katherine Munro
I’d like to thank my wife and the Vienna Data Science Group for their continuous support through my professional journey.
Zoltan C. Toth
Thinking about the people who supported me most, I want to thank my parents, who have always believed in me, no matter what, and my partner Verena, who was very patient again during the last months while I worked on this book.
In addition I’m very grateful for the support and motivation I got from the people I met through the Vienna Data Science Group.
Wolfgang Weidinger
Stefan Papp
“I want to be CDO instead of the CDO.”
Iznogoud (adjusted)
Questions Answered in this Chapter:
How could we describe a fictional company before its journey to becoming data-driven?
What challenges might such a company need to resolve to become data-driven?
How will the chapters in this book help you, the reader, to recognize and address such challenges in your own organization?
This book takes a practical, experience-led look into various aspects of data science and artificial intelligence. In this, our third edition, the authors also deeply dive into some of the most exciting and rapidly developing topics of our time, including large language models and generative AI.
The authors’ primary goal is to give the reader a holistic approach to the field. For this reason, this book is not purely technical: Data science and AI maturity depends as much on work culture, particularly critical thinking and evidence-based decision-making, as it does on knowledge in mathematics, neural networks, AI frameworks, and data platforms.
In recent years, most experts have come to agree that artificial intelligence will change how we work and live. For a holistic view, we must also look at the status quo, if we want to understand what needs to be done to meet our diverse ambitions with the help of AI. One useful frame for doing this is to explore how people deal with data transformation challenges from an organizational perspective. For this reason, we will shortly introduce the reader to a fictional company at the beginning of its journey to integrate evidence-based decision-making into its corporate identity. We’ll use this fictional company, in which most things could be more data-oriented but aren’t yet, as a model for outlining possible challenges organizations may encounter when aiming to become more data-driven. By the end of this book, our hypothetical company will also serve as a model of how a data-driven company could look. In the chapters in between, we’ll address many of these challenges and provide practical advice on how to tackle them.
Suppose you, as a reader, would rather not read prose about an invented company in order to learn about such typical organizational challenges. In that case, we encourage you to skip this chapter and start with one that fits your interests. As a holistic book on this field, the authors discuss artificial intelligence, machine learning, generative AI, modeling, natural language processing, computer vision, and other relevant areas. We cover engineering-related topics such as data architecture and data pipelines, which are essential for getting data-driven projects into production. Lastly, we also address critical social and legal issues surrounding the use of data. Each author goes into a lot of detail for their specific field, so there’s plenty for you to learn from.
We kindly ask readers to contact us directly to provide feedback on how we can do better to achieve our ambitious goal of becoming the standard literature providing a holistic approach to this field. If you feel some new content should be covered in one of the subsequent editions, you can find the authors on professional networks such as LinkedIn.
And with that said, let’s get started.
1.2The Halford GroupBob entered the office building of the Halford Group, a manufacturer of consumer products, including their best-selling rubber duck. After crossing the office doors, he felt he was thrown back into the eighties. Visitors having to register at the entrance, filling out forms to declare themselves liable in case of an accident, and promising not to take photos, was only the first step. As Bob entered the elevator, with its brass buttons and glossy, mahogany decor, he could have sworn he’d entered the setting of the movie “The Wolf of Wall Street.”
The executive office was similar. The brownish carpets showed their age, and the wallpapers looked like they’d inhaled the smoke of many an eighties Marlboro Man. The worn leather couches and the looming wooden desk (mahogany, again), seemed a memory of a great but distant past. Bob could imagine his dad—a man who had always been proud of being in sales and following the teachings of Zig Ziglar—doing business with this company in his younger years.
This image in Bob’s imagination was immediately disrupted when a young woman entered the room, and Bob was immediately thrown back into the present time. With an air of determination, she strode forward to reach for Bob’s hand. Somewhat taken aback, he took in the shock of platinum blonde hair, and the tattoos that had not been entirely hidden by her tailored suit, and raised his hand in response. The woman smiled.
1.2.1Alice Halford – Chairwoman“I’m Alice Halford,” she said, “I am the granddaughter of Big Harry Halford, the founder of this group. He built his empire from the ground up.”
Bob had read all the legends about the old Halford boss. Every article about him made it clear he did not listen to many people. Instead, “Big Harry” was a proud, determined captain; one who set the course and demanded absolute obedience from his team. Business magazines were yet to write much about Alice, as far as Bob knew. However, he had read one article in preparation for this meeting. Alice was different from the grand old family patriarch, it had said. She had won the succession in a fierce battle against three ambitious brothers, and been selected by the board as chairwoman, thanks to her big plans to transition the company into a modern enterprise that could meet the Zeitgeist of the 21st century.
“Although successful, today’s generation would call my granddad a dinosaur who just wanted to leave enough footprints to let the next generation know he had been there,” Alice said. “Especially in his last years, he was skeptical about changes. Many principal consultants from respectable companies came with heads high to our offices, explaining that our long-term existence would depend on becoming a data-driven company. However, my granddad always had a saying: The moment a computer decides, instead of a founder who knows their stuff and follows their gut, it’s over. All the once proud consultants and their supporters from within the company thought they could convince every executive to buy into their ideas of a modern company, but ultimately, they walked out with their tails between their legs.”
Alice smiled at Bob and continued, “my granddad’s retirement was long overdue, but, finally, his exotic Cuban cigars and his habit of drinking expensive whiskey forced him to end his work life. I took over as a chairwoman of the board. I want to eliminate all the smells of the last century. When I joined, I found parts of the company were highly toxic. My strategic consultants advised me that every large organization has some organizational arrogance and inefficiency. They also cautioned me to keep my expectations low. While many enthusiasts claim that AI will change the world forever, every large organization is like a living organism with many different subdivisions and characteristics. Changing a company’s culture is a long process, and many companies face similar challenges. Ultimately, every company is run by people, and nobody can change people over night. Some might be okay with changes, a few may even want them to happen too fast, but most people will resist changes in one way or another.
At the same time, I understand that we are running out of time. We learned that our main competitors are ahead of us, and if we do not catch up, we will eventually go out of business. Our current CEO has a background in Finance and, therefore, needs support from a data strategist. Bob, you have been recommended as the most outstanding expert to transform a company into a data-driven enterprise that disrupts traditional business models. You can talk with everyone; you have all the freedom you need. After that, I am curious about your ideas to change the company from the ground up.”
Bob nodded enthusiastically. “I love challenges. Your secretary already told me I shouldn’t have any other appointments in the afternoon. Can you introduce me to your team? I would love to learn more about how they work, and their requirements.”
“I thought you’d want to do that. First, you will meet David and Anna, the analysts. Then you’ll meet Tom, the sales director. It would be best if you also talked with the IT manager, Peter—” Alice stopped herself, sighed, and continued. “Lastly, I arranged a meeting for you with our production leader, the complaints department, our Head of Security, and finally with our HR. I will introduce our new CEO, who is flying in today to discuss details at dinner. I booked a table in a good restaurant close by. But it makes sense if you first talk to all the other stakeholders. I had my colleagues each arrange a one-on-one with you. You’re in for a busy afternoon, Bob.”
1.2.2AnalystsAs Alice swept out of the room, a bespeckled man apparently in his mid-forties, and a woman of about the same age, appeared in the doorway. It must have been the analysts, David and Anna. When neither appeared willing to enter the room first, Bob beckoned them inside. He was reminded of an empowerment seminar he’d attended some years ago: The trainer had been hell bent on turning everyone in the workshop into strong leaders, but warned that only the energetic would dominate the world. These analysts seemed to be the exact opposite. David laughed nervously as he entered, and Anna kept her eyes lowered as she headed to the nearest seat. Neither seemed too thrilled to be there; Bob didn’t even want to imagine how they would have performed in that seminar’s “primal scream” test.
David and Anna sat down, and Bob tried to break the ice with questions about their work. It took him a while, but finally, they started to talk.
“Well, we create reports for management,” David said. “We aim to keep things accurate, and we try to hand in our reports on time. It’s become something of a reputation,” he added with a weak chuckle.
Bob realized that if he was going to make them talk, he’d need to give his famous speech, summarized as, “your job in this meeting is to talk about your problem. Mine is to listen.” After all, he needed to transform Halford company into a data-driven company, and they were ones working closest with the company’s data.
Bob finished his speech with gusto, but Anna merely shrugged. “The management wants to know a lot, but our possibilities are limited.”
Bob tried his best to look both in the eyes, though Anna turned quickly away. “But what is it that prevents you from doing your work without any limits?”
“Our biggest challenge is the batch process from hell,” David spoke up suddenly. “This notorious daily job runs overnight and extracts all data from the operational databases. It is hugely complex. I lost count of how often this job failed over time.”
Got them, Bob thought, nodding in encouragement.
“And nobody knows why this job fails,” Anna jumped in. “But when it does, we don’t know if the data is accurate. So far, there has never been a problem if we handed in a report with questionable figures. But that’s probably because most managers ignore the facts and figures we provide anyway.”
“Exactly!” David threw up his hands. Bob started to worry he had stirred up a hornet’s nest.
“When a job fails, it’s me who has to go to IT,” David said. “I just can’t hear anymore that these nerds ran out of disk space and that some DevSecOps closed a firewall port again. All I want is the data to create my reports. I also fight often with our security department. Sometimes, their processes are so strict that they come close to sabotaging innovation. Occasionally, I get the impression they cut access to data sources on purpose to annoy us.”
“Often, we are asked if we want something more sophisticated,” Anna said, shaking her head in frustration. “It is always the same pattern. A manager visits a seminar and comes to us to ask us if we can ‘do AI’. If you ask me honestly, I would love to do something more sophisticated, but we are afraid that the whole system will break apart if we change something. So, I am just happy if we can provide the management with the data from the day before.”
Don’t get us wrong, ML and AI would be amazing. But our company must still master the basics. I believe most of our managers have no clue what AI does and what we could do with it. But will they admit it? Not a chance.”
Anna sat back in a huff. Bob did not need to ask them to know that both were applying at other companies for jobs.
1.2.3“CDO”At lunch break, a skinny man in a black turtleneck sweater hurled into the office. He seemed nervous, as if someone was chasing him. His eyes darted around the room, avoiding eye contact. His whole body was fidgeting, and he could not keep his hands still.
“I am the CDO. My name is Cesario Antonio Ramirez Sanchez; call me Cesar,” he introduced himself with a Spanish accent.
Bob was surprised that this meeting had not been announced. Meanwhile, his unexpected visitor kept approaching a chair and moving away from it again as if he could not decide whether to sit down or not.
“CDO? I have not seen this position in the org chart,” Bob answered calmly, “I have seen a Cesario Antonio Rami …”
“No no no … It’s not my official title. It is what I am doing,” Cesar said dramatically. “I am changing the company bottom up, you know? Like guerilla warfare. Without people like me, this company would still be in the Stone Age, you see?”
“I am interested in everyone’s view,” Bob replied, “but I report to Alice, and I cannot participate in any black ops work.”
“No, no, no …, everything is simple. Lots of imbeciles are running around in this company—” Cesar raised his finger and took a sharp breath, nodded twice, and continued. “I know … HR always tells me to be friendly with people and not to say bad words. But we have only data warehouses in this company. Not even a data lake. Catastrófica! Its the 21st century, and these dinosaurs work like in Latin America hace veinte años. Increíble!”
He took another breath, and then continued. “Let’s modernize! Everything! Start from zero. So much to do. First, we must toss these old devices into the garbage, you know? And replace them with streaming-enabled PLCs. Then, modern edge computing services streams everything with Kafka to different data stores. All problems solved. And then we’ll have a real-time analytics layer on top of a data mesh.”
Bob stared at his counterpart, who seemed unable to keep his eyes or his body still for more than a moment. “I am sorry, I do not understand.”
“You are an expert, you have a Ph.D., no? You should understand: modern factory, IoT, Industry 4.0, Factory of the Future.”
Bob decided not to answer. Instead, he kept his eyebrows raised as he waited for what Cesar would say next.
“So much potential,” Cesar went on. “And all is wasted. Why is HR always talking about people’s feelings? Everything is so easy. This old company needs to get modern. We don’t need artists, we need people with brains. If I want art, I listen to Mariachi in Cancun. If current people are imbeciles, hire new people. Smart people, with Ph.D. and experience. My old bosses in Latin America, you cannot imagine, they would have fired everyone, including HR. Let’s talk later; I’m in the IT department en la cava.”
Bob had no time to answer. Cesar left the room as fast as he had entered it.
1.2.4SalesA tall, slim, grey-haired man entered the room, took a place at the end of the table, leaned back and presented to Bob a salesman grin for which Colgate would have paid millions.
“I am Tom Jenkins. My friends call me ‘the Avalanche’. That’s because if I take the phone, nobody can stop me anymore. Back in the nineties, I made four sales in a single day. Can you imagine this?”
I get it; you are a hero. Bob thought. Let’s turn it down a bit.
“My name is Bob. I am a consultant who has been hired to help this company become more data-oriented.”
Tom’s winning smile vanished when Bob mentioned ‘data.’
“I have heard too much of the data talk,” Tom said. “No analysis can beat gut feeling and experience. Don’t get me wrong. I love accurate data about my sales records, but you should trust an experienced man to make his own decisions. No computer will ever tell me which potential client I should call. When I sit at my desk, I know which baby will fly.”
“With all due respect. I can show you a lot of examples of how an evidence-based approach has helped clients to make more revenue.”
“Did you hear yourself just now?” Tom answered, “Evidence-based. You do not win sales with brainy talks. You need to work on people’s emotions and relationships. No computer will ever do better sales than a salesman with a winning smile. I’ll give you an example: One day, our sales data showed that we sold fewer products in our prime region. Some data analysts told me something about demographic changes. What a nonsense!
So, I went out and talked to the people. I know my folks up there. They are all great people. All amazing guys! Very smart and very hands-on. I love this. We had some steaks and beers, then I pitched our new product line. Guess who was salesman of the month after that?
No computer needs to tell me how to approach my clients. So, as long as we get the sales reports right and we can calculate the commission, all is good. It is the salesman, not the computer, who closes a deal.”
With that, The Avalanche was on his feet. He invited Bob to a fantastic restaurant—“I know the owner and trust me, he makes the best steaks you’ll ever taste!”—and was gone.
1.2.5ITTen minutes past the planned meeting start time, Bob was still waiting for the team member he had heard most about upfront: the IT leader, Peter. His name had been mentioned by various people multiple times, but whenever Bob had asked to know more about him, people were reluctant to answer, or simply sighed and told him, “you’ll see.”
Finally, Peter stormed into the room, breathless and sweating. “This trip from my office in the cellar to this floor is a nightmare,” he said between gasps. “You meet so many people in the elevator who want something. I am constantly under so much stress, you cannot imagine! Here, I brought us some sandwiches. I have a little side business in gastronomy. You need a hobby like this to survive here. Without a hobby in this business, you go mad.”
Peter was a squat, red-faced man, who’d been with Halford since he was a lot younger, and had a lot more hair. He sank a little too comfortably in his chair, with the confidence of a man who’d been around so long, he was practically part of the furniture.
He doesn’t lack confidence, that’s for sure, Bob thought. I wonder how many dirty secrets this man has learned over the years that only he knows.
“Okay, let’s talk about IT then,” Peter sighed after Bob turned down the sandwiches. “My colleagues from the board and the executives still don’t get what it is they’re asking of me daily. When they invite me to meetings, I often do not show up anymore. We are a huge company, but nobody wants to invest in IT. I am understaffed; we hardly manage to keep the company running. Want to go for a cigarette?”
“No, thank you,” Bob said, but Peter was already crumpled pack from his trouser pocket. He rambled all the way to the smoker’s chamber, bouncing around from one topic to another. Bob learned everything about Peter, from his favorite food over his private home to his hernia, which was apparently only getting worse. Once Peter got first cigarette into his mouth, he went back to the topic Bob was really interested in.
“The suits want things without knowing the implications. On the one hand, they want everything to be secure, but then again, they want modern data solutions. Often, they ask me for one thing one day, and then the very next, they prioritize something else. To be blunt, I had my share of talks with these external consultants. If I allowed them to do what they asked me to do, I could immediately put all our data on a file server and invite hackers to download it with the same result. To keep things working, you need to firewall the whole company,” Peter stubbed out his cigarette, and reached for another.
Bob leaped at the chance to interject. “Can you tell me more about your IT department? I was looking for some documentation of the IT landscape. I have not found much information on your internal file shares. Which cloud provider are you currently using?”
Peter laughed and then started coughing. Tears in his eyes, he answered. “I told you, I’m understaffed. Do you really think I have time to document?” He pointed to his head. “Don’t worry, everything is stored in the grey cells up here. And we have a no-cloud strategy. Cloud is just a marketing thing if you ask me. When we build by ourselves, it is safer, and we have everything under control.
If I just had more people … Did you meet one of my guys, Cesar? He is also okay when he does not talk, which unfortunately doesn’t happen often. I don’t like when people think they are smarter than me. He doesn’t know Peter’s two rules yet. Rule Number 1: Do not get on your boss’s nerves. Rule Number 2. Follow rule number 1.”
Peter laughed, flicked the second cigarette on the ground, and retrieved a bag from his other pocket. It was full of caramels: Peter popped one into his mouth and continued, chewing loudly. “Alice asked me if I could introduce you to Bill, my lead engineer, but I declined. This guy has the brains of a fox but the communication skills of a donkey. He also gets nervous when you look him straight in the eyes. I am always worried that he might wet his pants— Or am I being too politically incorrect again? Our HR keeps telling me that I should be more friendly. But in this looney bin, you learn to let our your stress by saying what you think. So, please excuse my sarcasm. I am the last person standing between chaos and a running IT landscape, the management keeps getting on my nerves with stupid requests, and last but not least, the HR department is more concerned about how I communicate than about finding the people who could help me keep our company running.”
It took a couple of attempts until Bob could finally break free from Peter’s complaining to head to his next meeting. Even as he was leaving, Peter repeatedly called on Bob to visit his food business sometime, where they could have a drink in private, and Peter could share his Halford ‘war stories’ more openly.
1.2.6SecurityWhile waiting for the HR representative, Bob received a voice message from Suzie Wong, the head of data security. When Bob played it, he heard traffic sounds in the background.
“Apologies for not showing up. School called me in as one of my kids got sick. I hope a voice message is fine. I am Suzie Wong. I have been with Halford for years. They call me the human firewall against innovation. I take this as a compliment because, in some way, it means I am doing my job well. Could any company be happy with a Head of Security who takes her job easy? My predecessor was more laid back than I am. He was in his fifties and got a little too comfortable, thinking he would retire in a secure job. And then one day … there was this security breach. His kid’s still in private school, he’s suddenly without a job and, well, I’ll spare you the details.
People often think I’m only around to sign off on their intentions to use data, but my real job is protecting our client’s privacy. Data scientists must prove to me that our client’s data is safe when they want to work with it. Unfortunately, too many take that too lightly.
If the requestor follows the process, a privacy impact assessment could be done within a week. I will send you a link to our security portal later so you can review it. You’ll see for yourself that we do not ask for anything impossible.
I am the last line of defense, ensuring that we do not pay hefty fines because someone thought it was just data they were playing around with. Some people also jokingly call me ‘Mrs. No,’ because this is my common answer if you cannot express why I should grant you security exceptions or provide access to data containing clients’ private information. Some people complain that this way, it may take months to get security approval. But so long as engineers and data scientists still don’t get how to address security matters correctly, I don’t care if it takes years before I give my final OK.
Anyway, excuse me now, I’m at the school …”
1.2.7Production LeaderBob had some time before his next meeting and looked up his next meeting partner online. He discovered a middle-aged man with a long history on social media, including some questionable photos of his younger self in a Che Guevara t-shirt. Bob chuckled. That young man could be happy that their interview wasn’t taking place during the times of the Cold War.
Finally, Bob’s interviewee entered the room. He was muscular, and his bushy black beard showed the first signs of greying.
“My name is Hank. Pleased to meet you,” he said with a deep voice.
“I heard you are new in your position,” Bob said.
“Yes. Alice fired my predecessor because he was a tyrant. I am now one of the first of what she calls ‘the new generation.’ I accepted because I can change things here now. Let me get to the point: What are you planning to do?”
Bob smiled and said, “the idea in factories is often to use machine learning for automation. Think of processes where people check the quality of an item manually. Imagine that you can automate all this. A camera screens every piece, and defective items — which we call ‘rejects’ — are filtered out automatically.”
Hank stiffened. “My job is to protect jobs, not support removing them. Some of our factories are often in villages, where they are the only source of work.”
“Almost every country goes through demographic changes. Can you guarantee that you will be able to maintain a strong enough workforce to keep the factories running? How about doing the same with fewer people?”
“But if you remove a few people, they can end up out of work,” Hank said. “What if you don’t need workers at all in a few years? I don’t want to open the door to a system that makes the bourgeoisie richer and put the ordinary proletarian out of work.”
“That is very unlikely,” Bob said.
“I see you are solidary with your employees, Hank. Did you consider exploring use cases to protect them? We can use computer vision to see if factory workers wear helmets, for example.”
Hank looked deeply into Bob’s eyes. Bob couldn’t quite tell if it was a good or bad sign, be he did realize something: this was not a man he’d like to meet on a dark, empty street.
“I understand that there might be benefits for my colleagues,” Hank said. “I just want to open up a trojan horse: I get one IT system in to prevent accidents, and the next one makes the workers obsolete. But I promised Alice I’d support her. She is a good person. I will talk with my colleagues. I need to get them on board, but one thing is not negotiable: We will never tolerate any system that completely replaces people who need the job they have.”
1.2.8Customer ServiceThe next interviewee, an elderly woman with perfectly glossy, silver hair, entered the room. She sat down and carefully ran her fingers over classic French bun, ensuring not a hair was out of place.
“I am Annie from the complaints department,” she said with something of an aristocratic tone. She seemed more interested in her neatly manicured nails than Bob as she went on. “I honestly do not know why you want to talk to me.”
“Well, part of a data-driven enterprise is often also a customer-first strategy. We can measure customer churn and other metrics through data. Most of my clients want to use data to maximize success. They even renamed their departments to ‘Customer Satisfaction Department’ to underline this.”
“Aha,” Annie said. There was an uncomfortable silence as she polished the face of her antique watch with her other sleeve.
Bob cleared his throat, anxious to get her attention. “Would you be interested to learn more about your customers through data?”
“Why should I?”
“To serve them better?”
“We have sturdy products. Most complaints have no base. We believe the less money we spend on confused customers, the more we have left to improve our products. This is what I call the real customer value we provide.”
Ah-hah. Bob recognized the famous argument against investing in any domain that doesn’t directly create revenue. She probably gets a bonus for keeping yearly costs low, he thought, seeing an opportunity.
“And how do you keep costs small at the moment?”
“We have an offshore call center. They handle about 80 % of calls, although a lot of those customers just give up, for some reason. The remaining 20 % are forwarded to a small team of more advanced customer support employees. I know it sounds harsh, but you cannot imagine how many confused people try to call us without having a problem at all. Some – it seems – call us just to talk.”
“Right. And have you thought of the possibility to reduce costs by building chatbots backed by generative AI? There are also many ways to use data science to filter customer complaints. If properly trained, your clients get better support, and you reduce costs.”
“Would it be good enough to shut down the offshore center?
Gotcha. “If done right, yes.”
For what felt like the first time, Annie looked at Bob directly. “How much would it cost?”
“At the moment, it is still difficult to estimate.”
Annie thought a while, then stood up to leave. At the door, she paused. “Once you know, call me immediately.”
1.2.9HR“I’m, I’m Pratima,” came a woman’s voice at the door. She approached Bob, looked up at him with a welcoming smile and asked, “how can I help you, Bob?”
“Hi, Pratima. Let’s take a seat. As you know, I’m here to transform this company into a more data-oriented one. I saw on LinkedIn that you have previously worked for very modern companies with a strong data culture. How is it now to work for a company at the beginning of its journey?”
“Alice asked me to be open to you. I took this job as a career step to advance to leadership. However, the Wheel of Fortune led me to more challenges than expected.
In my previous job, we had the vibes to attract new talent. It was an environment primed for excellence: fancy office spaces, a modern work culture with flat hierarchies, cool products to work on, and many talented, diverse colleagues. Recruiting was easy because new candidates felt it the spirit of our community.”
Pratima sighed.
“In this company, though, we cannot hide that we are at the beginning of our transition. Applicants usually have many offers to choose from. Sometimes, we have to watch perfect candidates walk away because we do not yet provide a warm and welcoming environment for data professionals.
When managers discuss AI and data transition, some might oversee the human aspect. What if you create the perfect data strategy but cannot attract enough talent? Many companies face this problem, and an elephant is always in the room. To become a data-driven company, you have to create an environment that attracts people who think differently, and this means changing your culture.
“Do you believe management is scared to promote too much change because it is afraid to lose everything?”
“I understand that some seasoned employees might get disappointed and even resign if their comfortable environment starts to modernize. But at the same time, if you do not change at all, you are stuck in the mud, and your competition will make you obsolete. The Dalai Lama says we should be the change we wish to be.”
“Right. And I believe it was Seneca who once said, ‘It’s not because things are difficult that we dare not venture. It’s because we dare not venture that they are difficult.’”
“True! But I have to go now. I am looking forward to continuing our talks.”
1.2.10CEOAlice and Bob met at a fusion restaurant downtown in the evening. Alice introduced Bob to Santiago, the long-time CFO turned new CEO. After an excellent meal, they ordered some famous Armenian cognac, and got down to the real discussion.
“I’ll be honest with you, Bob,” Santiago began. “All your ideas to transform Halford sound fantastic, but as an economist and a numbers person, my first question is, how much will this all cost?”
Oh boy. Bob was prepared for the question, but he knew Santiago wouldn’t like the answer. “It depends,” he said, and Santiago looked about as dissatisfied as Bob would have expected.
“I understand that everyone looks at the costs,” Bob continued, “but history is full of companies that failed to innovate and went bankrupt as their competition moved forward. If you see the full spectrum of artificial intelligence, hardly any company will eventually operate as before.”
“Some companies recommend that we start with data literacy workshops to enable leaders to interpret data and numbers efficiently. Literacy sounds as if they want to teach us to read and write again—and for a huge amount of money, of course. Don’t get me wrong, please. I understand that we need to innovate, but if I approve everything consultants suggest to me, we will soon be broke.”
“But if your leadership team cannot ‘think in data’,” Bob said, making air quotes as he spoke, “how do they expect to attend our planned strategy workshop on exploring specific data science options for our business goals?”
“What is the difference?”
“In the data literacy workshops, we aim to create an understanding of how to interpret data. In the strategy workshop, we’ll create a list of use cases to improve processes in your company, and prioritize them, to integrate new data solutions gradually.”
“I understand that we have some tough nuts to crack. Some of our employees do not believe in becoming data-driven, and we may need to invest hugely in Enablement. We once asked external companies to help us modernize our IT. No consulting company gave me a quote with a fixed price for a transition project. They always said we were facing a hole without a bottom.”
“Leadership is the only way to move forward. If the executive team is convinced and aligned, this culture can spread.
Your operational IT will need to mature and modernize gradually. However, be aware that an analytical layer can be built outside of corporate IT. One risk is to make data transition to an IT problem; IT is part of it, but becoming a data-driven company is far more than giving some engineers a job to build platforms.”
“For me, it’s clear,” Alice said. “Either we modernize, or we gradually fade out of existence. Bob, what do you need to help us?”
Bob looked from one to the other, carefully considering his next words. “Becoming data-driven does not mean hiring a bunch of data scientists who do a bit of magic, and suddenly the company makes tons of money using AI. As I said, the first step is to align the stakeholders. For me, this is the alpha and omega of AI: creating a data culture based on critical thinking and evidence-based decisions. ”
“Great,” answered Alice. “Let’s get started with that.”
1.3In a NutshellExpectation Management
Most companies see the need to become data-driven, as they understand that those organizations that ignore technical evolution mostly fail.
Some employees might have unrealistic expectations about how fast a transition can go. We highlight that changing to a data-driven company is not just a change of practices and processes, it is often a cultural overhaul of how the company does its business.
Many employees fear having to give up some of their autonomy, or even losing their jobs to computers entirely, if AI is introduced at their company. An organization that transitions to become data-driven must address this.
Technology Focus and Missing Strategy
Some companies try to find a silver bullet that solves all problems. “We’ll just use this technology, just apply AI in this or that way, and all our problems are resolved,” they think. Being too technology-focused, however, is an anti-pattern that can hinder a company’s evolution to becoming data-driven.
Data Science and AI are about more than just Understanding Frameworks and Methods
While it is essential to have a team of skilled data scientists and AI engineers to pick the right AI frameworks and build complex AI systems, for large organizations, there are many other considerations to watch for. Not being able to understand the needs of an organization and where AI can make a difference is a risk. With the wrong target, every strategy will fail.
Collaboration between Analysts and IT
In some companies, IT provides the platforms that analysts have to use. If these platforms are error-prone or old, it can get frustrating for analysts. In modern environments, not all analytical platforms must be managed by one central IT department. This can give data teams more freedom to operate on their own.
IT
Many IT teams lack the resources to build the data pipelines needed for data science platforms. Often there is a gap between business users and engineers, making it hard for them to communicate with each other.