Data Science for Decision Makers - Jon Howells - E-Book

Data Science for Decision Makers E-Book

Jon Howells

0,0
32,39 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

As data science and artificial intelligence (AI) become prevalent across industries, executives without formal education in statistics and machine learning, as well as data scientists moving into leadership roles, must learn how to make informed decisions about complex models and manage data teams. This book will elevate your leadership skills by guiding you through the core concepts of data science and AI.
This comprehensive guide is designed to bridge the gap between business needs and technical solutions, empowering you to make informed decisions and drive measurable value within your organization. Through practical examples and clear explanations, you'll learn how to collect and analyze structured and unstructured data, build a strong foundation in statistics and machine learning, and evaluate models confidently. By recognizing common pitfalls and valuable use cases, you'll plan data science projects effectively, from the ground up to completion. Beyond technical aspects, this book provides tools to recruit top talent, manage high-performing teams, and stay up to date with industry advancements.
By the end of this book, you’ll be able to characterize the data within your organization and frame business problems as data science problems.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 392

Veröffentlichungsjahr: 2024

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Data Science for Decision Makers

Enhance your leadership skills with data science and AI expertise

Jon Howells

Data Science for Decision Makers

Copyright © 2024 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Group Product Manager: Ali Abidi

Publishing Product Manager: Tejashwini R

Book Project Manager: Hemangi Lotlikar

Content Development Editor: Joseph Sunil

Technical Editor: Rahul Limbachiya

Copy Editor: Safis Editing

Proofreader: Joseph Sunil

Indexer: Rekha Nair

Production Designer: Ponraj Dhandapani

DevRel Marketing Coordinator: Vinishka Kalra

First published: June 2024

Production reference: 1190624

Published by Packt Publishing Ltd.

Grosvenor House 11 St Paul’s SquareBirmingham B3 1RB, UK

ISBN 978-1-83763-729-4

www.packtpub.com

To my mother and father, Caroline and Robert, for instilling in me the values of education and constant curiosity. To my partner, Yeshica, for your unwavering support, and to my sister, Felicity, for your keen eye in reviewing and shaping this book.

– Jon Howells

Contributors

About the author

Jon Howells, director of AI consultancy QualifAI, is an experienced professional in data science and machine learning, with over a decade of experience in the consumer goods, market research, and public sectors. He has worked within consultancies including KPMG and Capgemini and with multinational clients such as Unilever and Permira, as well as public sector bodies such as the UK Home Office and the US Food and Drug Administration (FDA).

With an MSc in computational statistics and machine learning from UCL, Jon specializes in applying large language models (LLMs) to consumer-focused businesses, leveraging them for consumer research, personalized content generation, and enhanced customer support. His expertise helps businesses better understand and engage with their customers, driving innovation and unlocking the potential of data-driven decision-making.

About the reviewer

As a principal architect at T-Mobile, Tanmaya Gaur has more than 10 years of web development experience and a passion for delivering technical and architectural leadership for key technology initiatives and business capabilities. In the latest chapter of his professional career, he has been instrumental in shaping the architecture of T-Mobile’s primary CRM solution, which is built using modular micro-frontend architecture and enhances the digital experience for their care representatives and customers.

His expertise in web, infrastructure, and microservices enables him to design and deliver scalable solutions that are performant, secure, and resilient. He works closely with other business and IT partner teams in a highly collaborative environment and is committed to driving the best customer experience across mobile, desktop, point-of-sale, and other emerging devices.

Table of Contents

Preface

Part 1: Understanding Data Science and Its Foundations

1

Introducing Data Science

Data science, AI, and ML – what’s the difference?

The mathematical and statistical underpinnings of data science

Statistics and data science

What is statistics?

Descriptive and inferential statistics

Sampling strategies

Probability

Probability distribution

Conditional probability

Describing our samples

Measures of central tendency

Measures of dispersion

Degrees of freedom

Correlation, causation, and covariance

The shape of data

Probability distributions

Discrete probability distributions

Continuous probability distributions

Summary

2

Characterizing and Collecting Data

What are the key criteria to consider when evaluating datasets?

Data quantity

Data velocity

Data variety

Data quality

First-, second-, and third-party data

First-party data – the treasure trove within

Second-party data – building bridges through collaboration

Third-party data – broadening horizons with external expertise

Structured, unstructured, and semi-structured data

Structured data

Unstructured data

Semi-structured data

Methods for collecting data

Storing and processing data

Cloud, on-premises, and hybrid solutions – navigating the data storage and analysis landscape

Cloud computing – scalable services in the cloud

On-premises – maintaining control within your walls

Hybrid – the best of both worlds?

Data processing

Summary

3

Exploratory Data Analysis

Getting started with Google Colab

What is Google Colab?

A step-by-step guide to setting up Google Colab

Understanding the data you have

EDA techniques and tools

Descriptive statistics

Data visualization

Histograms

Density curves

Boxplots

Heatmaps

Dimensionality reduction

Correlation analysis

Outlier detection

Summary

4

The Significance of Significance

The idea of testing hypotheses

What is a hypothesis?

How does hypothesis testing work?

Formulating null and alternative hypotheses

Determining the significance level

Understanding errors

Getting to grips with p-values

Significance tests for a population proportion – making informed decisions about proportions

The z-test – comparing a sample proportion to a population proportion

Z-test example made easy

Significance tests for a population average (mean)

Writing hypotheses for a significance test about a mean

Conditions for a t-test about a mean

When to use z or t statistics in significance tests

Example – calculating the t-statistic for a test about a mean

Using a table to estimate the p-value from the t-statistic

Comparing the p-value from the t-statistic to the significance level

One-tailed and two-tailed tests

Walking through a case study

Summary

5

Understanding Regression

How can I benefit from understanding regression?

Introduction to trend lines

Fitting a trend line to data

Estimating the line of best fit

Calculating the equations of the lines of best fit

Interpreting the slope of a regression line

Interpreting the intercept of a regression line

Understanding residuals

Evaluating the goodness of fit in least-squares regression

Summary

Part 2: Machine Learning – Concepts, Applications, and Pitfalls

6

Introducing Machine Learning

From statistics to machine learning

What is machine learning?

How does machine learning relate to statistics?

Why is machine learning important?

Customer personalization and segmentation

Fraud detection and security

Supply chain and inventory optimization

Predictive maintenance

Healthcare diagnostics and treatment

The different types of machine learning

Supervised learning

Unsupervised learning

Semi-supervised learning

Reinforcement learning

Transfer learning

Popular machine learning algorithms

Linear regression

Logistic regression

Decision trees

Random forests

Support vector machines

k-nearest neighbors

Neural networks

The machine learning process

Training a supervised machine learning model

Validation of a supervised machine learning model

Testing a supervised machine learning model

Evaluating machine learning models

Risks and limitations of machine learning

Overfitting and underfitting

Bias and variance

Balanced dataset

Models are approximations of reality

Machine learning on unstructured data

Natural language processing (NLP)

Computer vision

Deep learning and artificial intelligence

Artificial intelligence

Deep learning

Summary

7

Supervised Machine Learning

Defining supervised learning

Applications of supervised learning

The two types of supervised learning

Key factors in supervised learning

Steps within supervised learning

Data preparation – laying the foundation

Algorithm selection – choosing the right tool

Model training – learning from data

Model evaluation – assessing performance

Prediction and deployment – putting the model to work

Characteristics of regression and classification algorithms

Regression algorithms

Classification algorithms

Key considerations in supervised learning

Evaluation metrics

Applications of supervised learning

Consumer goods

Retail

Manufacturing

Summary

8

Unsupervised Machine Learning

Defining UL

Practical examples of UL

Steps in UL

Step 1 – Data collection

Step 2 – Data preprocessing

Step 3 – Choosing the right model

Step 4 – Training the model

Step 5 – Interpretation and evaluation

In summary

Clustering – unveiling hidden patterns in your data

What is clustering?

How does clustering work?

k-means clustering

Practical applications of clustering

Evaluation metrics for clustering

In summary

Association rule learning

What is association rule learning?

The Apriori algorithm – a practical example

Evaluation metrics

In summary

Applications of UL

Market segmentation

Anomaly detection

Feature extraction

Summary

9

Interpreting and Evaluating Machine Learning Models

How do I know whether this model will be accurate?

Evaluating on test (holdout) data

Understanding evaluation metrics

Evaluating regression models

R-squared

Root mean squared error

Mean absolute error

When and how to use each metric

Practical evaluation strategies

Summarizing the evaluation of regression models

Evaluating classification models

Classification model evaluation metrics

Precision, recall, and F1-Score

Recall

F1-score

Methods for explaining machine learning models

Making sense of regression models – the power of coefficients

Decoding classification models – unveiling feature importance

Beyond specific models – universal insights using SHAP values

Summary

10

Common Pitfalls in Machine Learning

Understanding the complexity

Dirty data, damaged models – how data quantity and quality impact ML

The importance of adequate training data

Dealing with poor data quality

Conclusion

Overcoming overfitting and underfitting

Navigating training-serving skew and model drift

Ensuring fairness

Mastering overfitting and underfitting for optimal model performance

Overfitting – when your model is too specific

Underfitting – when your model is too simplistic

Spotting the problem

Conclusion

Training-serving skew and model drift

Training-serving skew

Model drift

Key takeaways

Bias and fairness

Understanding bias

Understanding fairness

Mitigating bias and ensuring fairness

Key takeaways

Summary

Part 3: Leading Successful Data Science Projects and Teams

11

The Structure of a Data Science Project

The various types of data science projects

Data products

Reports and analytics

Research and methodology

The stages of a data product

Identifying use cases

Evaluating use cases

Planning the data product

Developing a data product

Data preparation and exploratory analysis

Model design and development

Evaluation and testing

Deploying and monitoring a data product

General best practices for data product development

Evaluating impact

Predictive maintenance in manufacturing

Fraud detection in banking

Customer churn prediction in telecom

Demand forecasting in retail

Personalized recommendations in e-commerce

Predictive maintenance in energy

Workforce optimization in quick service restaurants

Chatbot-assisted customer support

Summary

12

The Data Science Team

Assembling your data science team – key roles and considerations

Data scientists

Machine learning engineers

Data engineers

MLOps engineers

Analytics engineers

Software engineers (full stack, frontend, backend)

Product managers

Business analysts

Data storytellers/visualization experts

Considerations when assembling your team

Data science teams within larger organizations

The hub and spoke model

What is the hub and spoke model?

Practical applications of the hub and spoke model

Building a hub and spoke model

The art of recruitment

Where to find technical talent

How high-performing data science teams operate

Cross-functional collaboration is essential

Diversity of perspectives drives innovation

Start with the right problem to solve

Invest in tooling, infrastructure, and workflow

Continuous adaption and learning are a must

Focus ruthlessly on outcomes over activity

Summary

13

Managing the Data Science Team

Day-to-day management of a data science team

Enabling rapid experimentation and innovation

Managing inherent uncertainty

Balancing research and application

Communicating effectively in data science and artificial intelligence

Fostering a culture of curiosity and continuous learning

Embracing peer review and collaboration

Common challenges in managing a data science team

Challenge 1 – recruiting and retaining top talent

Challenge 2 – aligning projects with business goals

Challenge 3 – managing inherent uncertainty

Challenge 4 – scaling and operationalizing models

Challenge 5 – deploying robust, reliable, fair models ethically

Empowering and motivating your data science team

Working with other teams and external stakeholders and empowering them to use data

Summary

14

Continuing Your Journey as a Data Science Leader

Navigating the landscape of emerging technologies

Specializing in an industry

Specializing in a field

Embracing continuous learning

Online courses

Cloud certifications

Technical tutorials and documentation

Learning plan framework

Staying up to date with current DS/ML/AI news and trends

Promoting data-driven thinking within your organization

Host internal learning sessions

Collaborate on cross-functional projects

Share success stories and lessons learned

Mentor and upskill colleagues

Establish a data science community of practice

Networking beyond your organization

Attend industry conferences and events

Join online communities and forums

Engage with local meetups and user groups

Collaborate on side projects or research

Offer mentorship or seek mentors

Summary

Index

Other Books You May Enjoy

Part 1: Understanding Data Science and Its Foundations

This part covers the foundations of data science, including key statistical concepts, data types, collection methods, exploratory data analysis, statistical significance, and regression. This part has the following chapters:

Chapter 1, Introducing Data ScienceChapter 2, Characterizing and Collecting DataChapter 3, Exploratory Data AnalysisChapter 4, The Significance of SignificanceChapter 5, Understanding Regression