29,99 €
Deep learning enables previously unattainable feats in automation, but extracting real-world business value from it is a daunting task. This book will teach you how to build complex deep learning models and gain intuition for structuring your data to accomplish your deep learning objectives.
This deep learning book explores every aspect of the deep learning life cycle, from planning and data preparation to model deployment and governance, using real-world scenarios that will take you through creating, deploying, and managing advanced solutions. You’ll also learn how to work with image, audio, text, and video data using deep learning architectures, as well as optimize and evaluate your deep learning models objectively to address issues such as bias, fairness, adversarial attacks, and model transparency.
As you progress, you’ll harness the power of AI platforms to streamline the deep learning life cycle and leverage Python libraries and frameworks such as PyTorch, ONNX, Catalyst, MLFlow, Captum, Nvidia Triton, Prometheus, and Grafana to execute efficient deep learning architectures, optimize model performance, and streamline the deployment processes. You’ll also discover the transformative potential of large language models (LLMs) for a wide array of applications.
By the end of this book, you'll have mastered deep learning techniques to unlock its full potential for your endeavors.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Veröffentlichungsjahr: 2023
The Deep Learning Architect’s Handbook
Build and deploy production-ready DL solutions leveraging the latest Python techniques
Ee Kin Chin
BIRMINGHAM—MUMBAI
Copyright © 2023 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Group Product Manager: Ali Abidi
Book Project Manager: Shambhavi Mishra
Senior Editor: Rohit Singh
Technical Editor: Devanshi Ayare
Copy Editor: Safis Editing
Proofreader: Safis Editing
Indexer: Subalakshmi Govindhan
Production Designer: Ponraj Dhandapani
DevRel Marketing Executive: Vinishka Kalra
First published: December 2023
Production reference: 1301123
Published by Packt Publishing Ltd.
Grosvenor House
11 St Paul’s Square
Birmingham
B3 1RB, UK.
ISBN 978-1-80324-379-5
www.packtpub.com
To my wife, Nina, the constant source of inspiration, support, and encouragement in my life. Without you, this book would have remained a dream.
– Ee Kin Chin
Ee Kin Chin is a senior deep learning engineer at DataRobot. He led teams to develop advanced AI tools used by numerous organizations from diverse industries and provided consultation on many customer AI use cases. Previously, he worked on deep learning (DL) computer vision projects for smart vehicles and human sensing applications at Panasonic and offered AI solutions using edge cameras at a tech solutions provider. He was also a DL mentor for an online course. Holding a Bachelor of Engineering (honors) degree in electronics, with a major in telecommunications, and a proven track record of successful application of AI, Ee Kin's expertise includes embedded applications, practical deep learning, data science, and classical machine learning.
A huge shout-out to my fantastic friends, mentors, colleagues, family, book reviewers, and the open source community who’ve supported and motivated me during my professional career. Your shared knowledge, insights, and wisdom have been invaluable.
Shivani Modi is a data scientist with expertise in machine learning, deep learning, and NLP, holding a master’s degree from Columbia University. Her five years of professional experience spans IBM, SAP, and C3 AI, where she has excelled in deploying scalable AI models across various sectors. At Konko AI, Shivani spearheaded the development of tools to optimize LLM selection and deployment. Shivani’s dedication to mentoring and talent development, coupled with her hands-on experience in leading complex projects, underscores her status as a thought leader in AI innovation. Her upcoming project aims to revolutionize how developers utilize LLMs, ensuring their secure and efficient implementation.
Ved Upadhyay is a seasoned data science and AI professional, bringing over seven years of hands-on experience in addressing enterprise-level challenges in deep learning. His expertise spans diverse industries, including retail, e-commerce, pharmaceuticals, agro-tech, and socio-tech, where he has successfully implemented AI solutions. Ved is currently working as a senior data scientist at Walmart, where he leads multiple initiatives focused on customer propensity and responsible AI. He earned his master’s degree in data science from the University of Illinois Urbana-Champaign and has contributed as a deep learning researcher at IIIT Hyderabad.
As a deep learning practitioner and enthusiast, I have spent years working on various projects and learning from diverse sources such as Kaggle, GitHub, colleagues, and real-life use cases. I've realized that there is a significant gap in the availability of cohesive, end-to-end deep learning resources. Traditional Massively Open Online Courses (MOOC), while helpful, often lack the practical knowledge and real-world insights that can only be gained through hands-on experience.
To bridge this gap, I've created The Deep Learning Architect Handbook, a comprehensive and practical guide that combines my unique experiences and insights. This book will help you navigate the complex landscape of deep learning, providing you with the knowledge and insights that would typically take years of hands-on experience to acquire, condensed into a resource that can be consumed in just days or weeks.
This book delves into various stages of the deep learning life cycle, from planning and data preparation to model deployment and governance. Throughout this journey, you'll encounter both foundational and advanced deep learning architectures, such as Multi-Layer Perceptrons (MLPs), Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), autoencoders, transformers, and cutting-edge methods, such as Neural Architecture Search (NAS). Divided into three parts, this book covers foundational methods, model insights, and DLOps, exploring advanced topics such as NAS, adversarial performance, and Large Language Model (LLM) solutions. By the end of this book, you will be well-prepared to design, develop, and deploy effective deep learning solutions, unlocking their full potential and driving innovation across various applications.
I hope that this book will serve as a way for me to give back to the community, by sparking conversations, challenging assumptions, and inspiring new ideas and approaches in the field of deep learning. I invite you to join me on this journey, and I look forward to hearing your thoughts and feedback as we explore the captivating world of deep learning together. Please feel free to reach out to me via LinkedIn through www.linkedin.com/in/chineekin, Kaggle through https://www.kaggle.com/dicksonchin93, or other channels listed on my LinkedIn profile. Your unique experiences and perspectives will undoubtedly contribute to the ongoing evolution of this book and the deep learning community as a whole.
This book is best suited for deep learning practitioners, data scientists, and machine learning developers who want to explore deep learning architectures to solve complex business problems. The audience of this book is professionals in the deep learning and AI space who are going to use the knowledge in their business use cases. Working knowledge of Python programming and a basic understanding of deep learning techniques is needed to get the most out of this book.
Chapter 1, Deep Learning Life Cycle, introduces the key stages of a deep learning project, focusing on planning and data preparation, and sets the stage for a comprehensive exploration of the deep learning life cycle throughout the book.
Chapter 2, Designing Deep Learning Architectures, dives into the foundational aspects of deep learning architectures, including MLPs, and discusses their role in advanced neural networks, as well as the importance of backpropagation and regularization.
Chapter 3, Understanding Convolutional Neural Networks, provides an in-depth look at CNNs, their applications in image processing, and various model families within the CNN domain.
Chapter 4, Understanding Recurrent Neural Networks, explores the structure and variations of RNNs and their ability to process sequential data effectively.
Chapter 5, Understanding Autoencoders, examines the fundamentals of autoencoders as a method for representation learning and their applications across different data modalities.
Chapter 6, Understanding Neural Network Transformers, delves into the versatile nature of transformers, capable of handling diverse data modalities without explicit data-specific biases, and their potential applications in various tasks and domains.
Chapter 7, Deep Neural Architecture Search, introduces the concept of NAS as a way to automate the design of advanced neural networks and discusses its applications and limitations in different scenarios.
Chapter 8, Exploring Supervised Deep Learning, covers various supervised learning problem types, techniques for implementing and training deep learning models, and practical implementations using popular deep learning frameworks.
Chapter 9, Exploring Unsupervised Deep Learning, discusses the contributions of deep learning to unsupervised learning, particularly highlighting the unsupervised pre-training method. Harnessing the vast amounts of freely available data on the internet, this approach improves model performance for downstream supervised tasks and paves the way toward general Artificial Intelligence (AI).
Chapter 10, Exploring Model Evaluation Methods, provides an overview of model evaluation techniques, metric engineering, and strategies for optimizing against evaluation metrics.
Chapter 11, Explaining Neural Network Predictions, delves into the prediction explanation landscape, focusing on the integrated gradients technique and its practical applications for understanding neural network predictions.
Chapter 12, Interpreting Neural Networks, delves into the nuances of model understanding and showcases techniques for uncovering patterns detected by neurons. By exploring real images and generating images through optimization to activate specific neurons, you will gain valuable insights into the neural network’s decision-making process.
Chapter 13, Exploring Bias and Fairness, addresses the critical issue of bias and fairness in machine learning models, discussing various types, metrics, and programmatic methods for detecting and mitigating bias.
Chapter 14, Analyzing Adversarial Performance, examines the importance of adversarial performance analysis in identifying vulnerabilities and weaknesses in machine learning models, along with practical examples and techniques for analysis.
Chapter 15, Deploying Deep Learning Models in Production, focuses on key components, requirements, and strategies for deploying deep learning models in production environments, including architectural choices, hardware infrastructure, and model packaging.
Chapter 16, Governing Deep Learning Models, explores the fundamental pillars of model governance, including model utilization, model monitoring, and model maintenance, while providing practical steps for monitoring deep learning models.
Chapter 17, Managing Drift Effectively in a Dynamic Environment, discusses the concept of drift and its impact on model performance, along with strategies for detecting, quantifying, and mitigating drift in deep learning models.
Chapter 18, Exploring the DataRobot AI Platform, showcases the benefits of AI platforms, specifically DataRobot, in streamlining and accelerating the deep learning life cycle, and highlights various features and capabilities of the platform.
Chapter 19, Architecting LLM Solutions, delves into LLMs and the potential applications, challenges, and strategies for creating effective, contextually aware solutions using LLMs.
The code provided in the chapters has been tested on a computer with Python 3.10, Ubuntu 20.04 LTS 64-bit OS, 32 GB RAM, and an RTX 2080TI GPU for running deep learning models. Although the code has been tested on this specific setup, it may also work on other configurations; however, compatibility and performance are not guaranteed. Python dependencies are included in the requirements.txt file for easy installation in each chapter’s respective GitHub folders. Additionally, some non-Python software might be required; their installation instructions will be mentioned at the beginning of each relevant tutorial. For these software installations, you need to refer to external manuals or guides to install them. Do keep in mind the potential differences in system configurations as you carry out the practical code sections in this book.
If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.
You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/The-Deep-Learning-Architect-Handbook. If there’s an update to the code, it will be updated in the GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
There are a number of text conventions used throughout this book.
Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: “We will be using pandas for data manipulation and structuring, matplotlib and seaborn for plotting graphs, tqdm for visualizing iteration progress, and lingua for text language detection.”
A block of code is set as follows:
import pandas as pd import matplotlib.pyplot as plt import seaborn as sns from tqdm import tqdm from lingua import Language, LanguageDetectorBuilder tqdm.pandas()Any command-line input or output is written as follows:
sudo systemctl start node_exporter sudo systemctl start prometheusBold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: “We can set up the Prometheus link now by clicking on the three-line button on the top-left tab and clicking on the Data Sources tab under the Administration dropdown.”
Tips or important notes
Appear like this.
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, email us at [email protected] and mention the book title in the subject of your message.
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.
Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.
Once you’ve read The Deep Learning Architect’s Handbook, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.
Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.
Thanks for purchasing this book!
Do you like to read on the go but are unable to carry your print books everywhere?
Is your eBook purchase not compatible with the device of your choice?
Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.
Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.
The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily
Follow these simple steps to get the benefits:
Scan the QR code or visit the link belowhttps://packt.link/free-ebook/9781803243795
Submit your proof of purchaseThat’s it! We’ll send your free PDF and other benefits to your email directlyIn this part of the book, you will gain a comprehensive understanding of the foundational methods and techniques in deep learning architectures. Starting with the deep learning life cycle, you will explore various stages at a high level, from planning and data preparation to model development, insights, deployment, and governance. You will then dive into the intricacies of designing deep learning architectures such as MLPs, CNNs, RNNs, autoencoders, and transformers. Additionally, you will learn about the emerging method of neural architecture search and its impact on the field of deep learning.
Throughout this part, you will also delve into the practical aspects of supervised and unsupervised deep learning, covering topics such as binary classification, multiclassification, regression, and multitask learning, as well as unsupervised pre-training and representation learning. With a focus on real-world applications, this part provides valuable insights into the implementation of deep learning models using popular frameworks and programming languages.
By the end of this part, you will have a solid foundation in deep learning architectures, methods, and life cycles, which will enable you to continue your journey to face other challenges involved in crafting deep learning solutions.
This part contains the following chapters:
Chapter 1, Deep Learning Life CycleChapter 2, Designing Deep Learning ArchitecturesChapter 3, Understanding Convolutional Neural NetworksChapter 4, Understanding Recurrent Neural NetworksChapter 5, Understanding AutoencodersChapter 6, Understanding Neural Network TransformersChapter 7, Deep Neural Architecture SearchChapter 8, Exploring Supervised Deep LearningChapter 9, Exploring Unsupervised Deep LearningIn this chapter, we will explore the intricacies of the deep learning life cycle. Sharing similar characteristics to the machine learning life cycle, the deep learning life cycle is a framework as much as it is a methodology that will allow a deep learning project idea to be insanely successful or to be completely scrapped when it is appropriate. We will grasp the reasons why the process is cyclical and understand some of the life cycle’s initial processes on a deeper level. Additionally, we will go through some high-level sneak peeks of the later processes of the life cycle that will be explored at a deeper level in future chapters.
Comprehensively, this chapter will help you do the following:
Understand the similarities and differences between the deep learning life cycle and its machine learning life cycle counterpartUnderstand where domain knowledge fits in a deep learning projectUnderstand the few key steps in planning a deep learning project to make sure it can tangibly create real-world valueGrasp some deep learning model development details at a high levelGrasp the importance of model interpretation and the variety of deep learning interpretation techniques at a high levelExplore high-level concepts of model deployments and their governanceLearn to choose the necessary tools to carry out the processes in the deep learning life cycleWe’ll cover this material in the following sections:
Machine learning life cycleThe construction strategy of a deep learning life cycleThe data preparation stageDeep learning model developmentDelivering model insightsManaging risksThis chapter includes some practical implementations in the Python programming language. To complete it, you need to have a computer with the following libraries installed:
pandasmatplotlibseaborntqdmlinguaThe code files are available on GitHub: https://github.com/PacktPublishing/The-Deep-Learning-Architect-Handbook/tree/main/CHAPTER_1.
Deep learning is a subset of the wider machine learning category. The main characteristic that sets it apart from other machine learning algorithms is the foundational building block called neural networks. As deep learning has advanced tremendously since the early 2000s, it has made many previously unachievable feats possible through its machine learning counterparts. Specifically, deep learning has made breakthroughs in recognizing complex patterns that exist in complex and unstructured data such as text, images, videos, and audio. Some of the successful applications of deep learning today are face recognition with images, speech recognition from audio data, and language translation with textual data.
Machine learning, on the other hand, is a subset of the wider artificial intelligence category. Its algorithms, such as tree-based models and linear models, which are not considered to be deep learning models, still serve a wide range of use cases involving tabular data, which is the bulk of the data that’s stored by small and big organizations alike. This tabular data may exist in multiple structured databases and can span from 1 to 10 years’ worth of historical data that has the potential to be used for building predictive machine learning models. Some of the notable predictive applications for machine learning algorithms are fraud detection in the finance industry, product recommendations in e-commerce, and predictive maintenance in the manufacturing industry. Figure 1.1 shows the relationships between deep learning, machine learning, and artificial intelligence for a clearer visual distinction between them:
Figure 1.1 – Artificial intelligence relationships
Now that we know what deep learning and machine learning are in a nutshell, we are ready for a glimpse of the machine learning life cycle, as shown in Figure 1.2:
Figure 1.2 – Deep learning/machine learning life cycle
As advanced and complex the deep learning algorithm is compared to other machine learning algorithms, the guiding methodologies that are needed to ensure success in both domains are unequivocally the same. The machine learning life cycle involves six stages that interact with each other in different ways:
PlanningData PreparationModel DevelopmentDeliver Model InsightsModel DeploymentModel GovernanceFigure 1.2 shows these six stages and the possible stage transitions depicted with arrows. Typically, a machine learning project will iterate between stages, depending on the business requirements. In a deep learning project, most of the innovative predictive use cases require manual data collection and data annotation, which is a process that lies in the realm of the Data Preparation stage. As this process is generally time-consuming, especially when the data itself is not readily available, a go-to solution would be to start with an acceptable initial number of data and transition into the Model Development stage and, subsequently, to the Deliver Model Insight stage to make sure results from the ideas are sane.
After the initial validation process, depending again on business requirements, practitioners would then decide to transition back into the Data Preparation stage and continue to iterate through these stages cyclically in different data size milestones until results are satisfactory toward both the model development and business metrics. Once it gets approval from the necessary stakeholders, the project then goes into the Model Deployment stage, where the built machine learning model will be served to allow its predictions to be consumed. The final stage is Model Governance, where practitioners carry out tasks that manage the risk, performance, and reliability of the deployed machine learning model. Model deployment and model governance both deserve more in-depth discussion and will be introduced in separate chapters closer to the end of this book. Whenever any of the key metrics fail to maintain themselves to a certain determined confidence level, the project will fall back into the Data Preparation stage of the cycle and repeat the same flow all over again.
The ideal machine learning project flows through the stages cyclically for as long as the business application needs it. However, machine learning projects are typically susceptible to a high probability of failure. According to a survey conducted by Dimensional Research and Alegion, covering around 300 machine learning practitioners from 20 different business industries, 78% of machine learning projects get held back or delayed at some point before deployment. Additionally, Gartner predicted that 85% of machine learning projects will fail (https://venturebeat.com/2021/06/28/why-most-ai-implementations-fail-and-what-enterprises-can-do-to-beat-the-odds/). By expecting the unexpected, and anticipating failures before they happen, practitioners can likely circumvent potential failure factors early down the line in the planning stage. This also brings us to the trash icon bundled together in Figure 1.2. Proper projects with a good plan typically get discarded only at the Deliver Model Insights stage, when it’s clear that the proposed model and project can’t deliver satisfactory results.
Now that we’ve covered an overview of the machine learning life cycle, let’s dive into each of the stages individually, broken down into sections, to help you discover the key tips and techniques that are needed the complete each stage successfully. These stages will be discussed in an abstract format and are not a concrete depiction of what you should ultimately be doing for your project since all projects are unique and strategies should always be evaluated on a case-by-case basis.
A deep learning model can only realize real-world value by being part of a system that performs some sort of operation. Bringing deep learning models from research papers to actual real-world usage is not an easy task. Thus, performing proper planning before conducting any project is a more reliable and structured way to achieve the desired goals. This section will discuss some considerations and strategies that will be beneficial when you start to plan your deep learning project toward success.
Today, deep learning practitioners tend to focus a lot on the algorithmic model-building part of the process. It takes a considerable amount of mental strength to not get hooked on the hype of state-of-the-art (SOTA) research-focused techniques. With crazy techniques such as pixtopix, which is capable of generating high-resolution realistic color images from just sketches or image masks, and natural language processing (NLP) techniques such as GPT-3, a 175-billion parameters text generation model from OpenAI, and GPT-4, a multimodal text generation model that is a successor to GPT-3 and its sub-models, that are capable of generating practically anything you ask it to in a text format that ranges from text summarization to generating code, why wouldn’t they?!
Jokes aside, to become a true deep learning architect, we need to come to a consensus that any successful machine learning or deep learning project starts with the business problem and not from the shiny new research paper you just read online complete with a public GitHub repository. The planning stage often involves many business executives who are not savvy about the details of machine learning algorithms and often, the same set of people wouldn’t care about it at all. These algorithms are daunting for business-focused stakeholders to understand and, when added on top of the tough mental barriers of the adoption of artificial intelligence technologies itself, it doesn’t make the project any more likely to be adopted.
Deep learning shines the most in handling unstructured data. This includes image data, text data, audio data, and video data. This is largely due to the model’s ability to automatically learn and extract complex, high-level features from the raw data. In the case of images and videos, deep learning models can capture spatial and temporal patterns, recognizing objects, scenes, and activities. With audio data, deep learning can understand the nuances of speech, noise, and various sound elements, making it possible to build applications such as speech recognition, voice assistants, and audio classification systems. For text data, deep learning models can capture the context, semantics, and syntax, enabling NLP tasks such as sentiment analysis, machine translation, and text summarization.
This means that if this data exists and is utilized by your company in its business processes, there may be an opportunity to solve a problem with the help of deep learning. However, never overcomplicate problems just so you can solve them with deep learning. Equating this to something more relatable, you wouldn’t use a huge sledgehammer to get a nail into wood. It could work and you might get away with it, but you’d risk bending the nail or injuring yourself while using it.
Once a problem has been identified, evaluate the business value of solving it. Not all problems are born the same and they can be ranked based on their business impact, value, complexity, risks, costs, and suitability for deep learning. Generally, you’d be looking for high impact, high value, low complexity, low risks, low cost, and high suitability to deep learning. Trade-offs between these metrics are expected but simply put, make sure the problem you’ve discovered is worth solving at all with deep learning. A general rule of thumb is to always resort to a simpler solution for a problem, even if it ends up abandoning the usage of deep learning technologies. Simple approaches tend to be more reliable, less costly, less prone to risks, and faster to fruition.
Consider a problem where a solution is needed to remove background scenes in a video feed and leave only humans or necessary objects untouched so that a more suitable background scene can be overlaid as a background instead. This is a common problem in the professional filmmaking industry in all film genres today.
Semantic segmentation, which is the task of assigning a label to every pixel of an image in the width and height dimensions, is a method that is needed to solve such a problem. In this case, the task needs to assign labels that can help identify which pixels need to be removed. With the advent of many publicly available semantic segmentation datasets, deep learning has been able to advance considerably in the semantic segmentation field, allowing itself to achieve a very satisfactory fine-grained understanding of the world, enough so that it can be applied in the industry of autonomous driving and robot navigation most prominently. However, deep learning is not known to be 100% error-free and almost always has some error, even in the controlled evaluation dataset. In the case of human segmentation, for example, the model would likely result in the most errors in the fine hair areas. Most filmmakers aim for perfect depictions of their films and require that every single pixel gets removed appropriately without fail since a lot of money is spent on the time of the actors hired for the film. Additionally, a lot of time and money would be wasted in manually removing objects that could be otherwise simply removed if the scene had been shot with a green screen. This is an example of a case where we should not overcomplicate the problem. A green screen is all you need to solve the problem described: specifically, the rare chromakey green color. When green screens are prepped properly in the areas where the desired imagery will be overlaid digitally, image processing techniques alone can remove the pixels that are considered to be in the small light intensity range centered on the chromakey green color and achieve semantic segmentation effectively with a rule-based solution. The green screen is a simpler solution that is cost-effective, foolproof, and fast to set up.
That was a mouthful! Now, let’s go through a simpler problem. Consider a problem where we want to automatically and digitally identify when it rains. In this use case, it is important to understand the actual requirements and goals of identifying the rain: is it sufficient to detect rain exactly when it happens? Or do we need to identify whether rain will happen in the near future? What will we use the information of rain events for? These questions will guide whether deep learning is required or not. We, as humans, know that rain can be predicted by visual input by either looking at the presence of raindrops falling or looking at cloud conditions. However, if the use case is sufficient to detect rain when it happens, and the goal of detecting rain is to determine when to water the plants, a simpler approach would be to use an electronic sensor to detect the presence of water or humidity. Only when you want to estimate whether it will rain in the future, let’s say in 15 minutes, does deep learning make more sense to be applied as there are a lot of interactions between meteorological factors that can affect rainfall. Only by brainstorming each use case and analyzing all potential solutions, even outside of deep learning, can you make sure deep learning brings tangible business value compared to other solutions. Do not just apply deep learning because you want to.
At times, when value isn’t clear when you’re directly considering a use case, or when value is clear but you have no idea how to execute it, consider finding reference projects from companies in the same industry. Companies in the same industry have a high chance of wanting to optimize the same processes or solve the same pain points. Similar reference projects can serve as a guide to designing a deep learning system and can serve as proof that the use case being considered is worthy of the involvement of deep learning technologies. Of course, not everybody has access to details like this, but you’d be surprised what Google can tell you these days. Even if there isn’t a similar project being carried out for direct reference, you would likely be able to pivot upon the other machine learning project references that already have a track record of bringing value to the same industry.
Admittedly, rejecting deep learning at times would be a hard pill to swallow considering that most practitioners get paid to implement deep learning solutions. However, dismissing it earlier will allow you to focus your time on more valuable problems that would be more useful to solve with deep learning and prevent the risk of undermining the potential of deep learning in cases where simpler solutions can outperform deep learning. Criteria for deep learning worthiness should be evaluated on a case-by-case basis and as a practitioner, the best advice to follow is to simply practice common sense. Spend a good amount of time going through the problem exploration and the worthiness evaluation process. The last thing you want is to spend a painstaking amount of time preparing data, building a deep learning model, and delivering very convincing model insights only to find out that the label you are trying to predict does not provide enough value for the business to invest further.
Ever heard sentences like “My deep learning model just got 99% accuracy on my validation dataset!”? Data scientists often make the mistake of determining the success of a machine learning project just by using validation metrics they use to evaluate their machine learning models during the model development process. Model-building metrics such as accuracy, precision, or recall are important metrics to consider in a machine learning project but unless they add business values and connect to the business objectives in some way, they rarely mean anything. A project can achieve a good accuracy score but still fail to achieve the desired business goals. This can happen in cases when no proper success metrics have been defined early and subsequently cause a wrong label to be used in the data preparation and model development stages. Furthermore, even when the model metric positively impacts business processes directly, there is a chance that the achievement won’t be communicated effectively to business stakeholders and the worst case not considered to be successful when reported as-is.
Success metrics, when defined early, act as the machine learning project’s guardrails and ensure that the project goals are aligned with the business goals. One of the guardrails is that a success metric can help guide the choice of a proper label that can at inference time, tangibly improve the business processes or otherwise create value in the business. First, let’s make sure we are aligned with what a label means, which is a value that you want the machine learning model to predict. The purpose of a machine learning model is to assign these labels automatically given some form of input data, and thus during the data preparation and model development stages, a label needs to be chosen to serve that purpose. Choosing the wrong label can be catastrophic to a deep learning project as sometimes, when data is not readily available, it means the project has to start all over again from the data preparation stage. Labels should always be indirectly or directly attributed to the success metric.
Success metrics, as the name suggests, can be plural, and range from time-based success definitions or milestones to the overall project success, and from intangible to tangible. It’s good practice to generally brainstorm and document all the possible success criteria from a low level to a high level. Another best practice is to make sure to always define tangible success metrics alongside intangible metrics. Intangible metrics generate awareness, but tangible metrics make sure things are measurable and thus make them that much more attainable. A few examples of intangible and hard-to-measure metrics are as follows:
Increasing customer satisfactionIncreasing employee performanceImproving shareholder outlookMetrics are ways to measure something and are tied to goals to seal the deal. Goals themselves can be intangible, similar to the few examples listed previously, but so long as it is tied to tangible metrics, the project is off to a good start. When you have a clear goal, ask yourself in what way the goal can be proven to be achieved, demonstrated, or measured. A few examples of tangible success metrics for machine learning projects that could align with business goals are as follows:
Increase the time customers spend, which can be a proxy for customer delightIncrease company revenue, which can be a proxy for employee performanceIncrease the click-through rate (CTR), which can be a proxy for the effectiveness of targeted marketing campaignsIncrease the customer lifetime value (CLTV), which can be a proxy for long-term customer satisfaction and loyaltyIncrease conversion rate, which can be a proxy for the success of promotional campaigns and website user experienceThis concept is not new nor limited to just machine learning projects – just about any single project carried out for a company as every single real-world project needs to be aligned with the business goal. Many foundational project management techniques can be applied similarly to machine learning projects, and spending time gaining some project management skills out of the machine learning field would be beneficial and transferable to machine learning projects. Additionally, as machine learning is considered to be a software-based technology, software project management methodologies also apply.
A final concluding thought to take away is that machine learning systems are not about how advanced your machine learning models are, but instead about how humans and machine intelligence can work together to achieve a greater good and create value.
Deep learning often involves neural network architectures with a large set of parameters, otherwise called weights. These architecture’s sizes can go from holding a few parameters up to holding hundreds of billions of parameters. For example, an OpenAI GPT-3 text generation model holds 175 billion neural network parameters, which amounts to around 350 GB in computer storage size. This means that to run GPT-3, you need a machine with a random access memory (RAM) size of at least 350 GB!
Deep learning model frameworks such as PyTorch and TensorFlow have been built to work with devices called graphics processing units (GPUs), which offer tremendous neural network model training and inference speedups. Off-the-shelf GPU devices commonly have a GPU RAM of 12 GB and are nowhere near the requirements needed to load a GPT-3 model in GPU mode. However, there are still methods to partition big models into multiple GPUs and run the model on GPUs. Additionally, some methods can allow for distributed GPU model training and inference to support larger data batch sizes at any one usage point. GPUs are not considered cheap devices and can cost anywhere from a few hundred bucks to hundreds of thousands from the most widely used GPU brand, Nvidia. With the rise of cryptocurrency technologies, the availability of GPUs is also reduced significantly due to people buying them immediately when they are in stock. All these emphasize the need to plan computing resources for training and inferencing deep learning models beforehand.
It is important to align your model development and deployment needs to your computing resource allocation early in the project. Start by gauging the range of sizes of deep learning architectures that are suitable for the task at hand either by browsing research papers or websites that provide a good summary of techniques, and setting aside computing resources for the model development process.
Tip
paperswithcode.com provides summaries of a wide variety of techniques grouped by a wide variety of tasks!
When computing resources are not readily available, make sure you always make purchase plans early, especially if it involves GPUs. But what if a physical machine is not desired? An alternative to using computing resources is to use paid cloud computing resource providers you can access online easily from anywhere in the world. During the model development stage, one of the benefits of having more GPUs with more RAM allocated is that it can allow you to train models faster by either using a larger data batch size during training or allowing the capability to train multiple models at any one time. It is generally fine to also use CPU-only deep learning model training, but the model training time would just inevitably be much longer.
The GPU and CPU-based computing resources that are required during training are often considered overkill to be used during inference time when they are deployed. Different applications have different deployment computing requirements and the decision on what resource specification to allocate can be gauged by asking yourself the following three questions:
How often are the inference requests made?Many inference requests in a short period might signal the need to have more than one inference service up in multiple computing devices in parallelWhat is the average amount of samples that are requested for a prediction at any one time?Device RAM requirements should match batch size expectationsHow fast do you need a reply?GPUs are needed if it’s seconds or a faster response time requirementCPUs can do the job if you don’t care about the response timeResource planning is not restricted to just computing resource planning – it also expands to human resource planning. Assumptions for the number of deep learning engineers and data scientists working together in a team would ultimately affect the choices of software libraries and tools used in the model development process. The analogy of choosing these tools will be introduced in future sections.
The next step is to prepare your data.
Data is to machine learning models as is the fuel to your car, the electricity to your electronic devices, and the food for your body. A machine learning model works by trying to capture the relationships between the provided input and output data. Similar to how human brains work, a machine learning model will attempt to iterate through collected data examples and slowly build a memory of the patterns required to map the provided input data to the provided target output data. The data preparation stage consists of methods and processes required to prepare ready-to-use data to build a machine learning model that includes the following:
Acquisition of raw input and targeted output dataExploratory data analysis of the acquired dataData pre-processingWe will discuss each of these topics in the following subsections.
Deep learning can be broadly categorized into two problem types, namely supervised learning and unsupervised learning. Both of these problem types involve building a deep learning model that is capable of making informed predictions as outputs, given well-defined data inputs.
Supervised learning is a problem type where labels are involved that act as the source of truth to learn from. Labels can exist in many forms and can be broken down into two problem types, namely classification and regression. Classification is the process where a specific discrete class is predicted among other classes when given input data. Many more complex problems derive from the base classification problem types, such as instance segmentation, multilabel classification, and object detection. Regression, on the other hand, is the process where a continuous numerical value is predicted when given input data. Likewise, complex problem types can be derived from the base regression problem type, such as multi-regression and image bounding box regression.
Unsupervised learning, on the other hand, is a problem type where there aren’t any labels involved and the goals can vary widely. Anomaly detection, clustering, and feature representation learning are the most common problem types that belong to the unsupervised learning category.
We will go through these two problem types separately for deep learning in Chapter 8, Exploring Supervised Deep Learning, and Chapter 9, Exploring Unsupervised Deep Learning.
Next, let’s learn about the things you should consider when acquiring data.
Acquiring data in the context of deep learning usually involves unstructured data, which includes image data, video data, text data, and audio data. Sometimes, data can be readily available and stored through some business processes in a database but very often, it has to be collected manually from the environment from scratch. Additionally, very often, labels for this data are not readily available and require manual annotation work. Along with the capability of deep learning algorithms to process and digest highly complex data comes the need to feed it more data compared to its machine learning counterparts. The requirement to perform data collection and data annotation in high volumes is the main reason why deep learning is considered to have a high barrier of entry today.
Don’t rush into choosing an algorithm quickly in a machine learning project. Spend a quality amount of time formally defining the features that can be acquired to predict the target variable. Get help from domain experts during the process and brainstorm potential predictive features that relate to the target variable. In actual projects, it is common to spend a big portion of your time planning and acquiring the data while making sure the acquired data is fit for a machine learning model’s consumption and subsequently spending the rest of the time in model building, model deployment, and model governance. A lot of research has been done into handling bad-quality data during the model development stage but most of these techniques aren’t comprehensive and are limited in ways that they can cover up the inherent quality of the data. Displaying ignorance in quality assurance during the data acquisition stage and showing enthusiasm only in the data science portion of the workflow is a strong indicator that the project would be doomed to failure right from the inception stage.
Formulating a data acquisition strategy is a daunting task when you don’t know what it means to have good-quality data. Let’s go through a few pillars of data quality you should consider for your data in the context of actual business use cases and machine learning:
Representativeness: How representative is the data concerning the real-world data population?Consistency: How consistent are the annotation methods? Does the same pattern match the same label or are there some inconsistencies?Comprehensiveness: Are all variations of a specific label covered in the collected dataset?Uniqueness: Does the data contain a lot of duplicated or similar data?Fairness: Is the collected data biased toward any specific labels or data groups?Validity: Does the data contain invalid fields? Do the data inputs match up with their labels? Is there missing data?Let’s look at each of these in detail.
Data should be collected in a way that it mimics what data you will receive during model deployment as much as possible. Very often in research-based deep learning projects, researchers collect their data in a closed environment with controlled environmental variables. One of the reasons researchers prefer collecting data from a controlled environment is that they can build stabler machine learning models and generally try to prove a point. Eventually, when the research paper is published, you see amazing results that were applied using handpicked data to impress. These models, which are built on controlled data, fail miserably when you apply them to random uncontrolled real-world examples. Don’t get me wrong – it’s great to have these controlled datasets available to contribute toward a stabler machine learning model at times, but having uncontrolled real-world examples as a main part of the training and evaluation datasets is key to achieving a generalizable model.
Sometimes, the acquired training data has an expiry date and does not stay representative forever. This scenario is called data drift and will be discussed in more detail in the Managing risk section closer to the end of this chapter. The representativeness metric for data quality should also be evaluated based on the future expectations of the data the model will receive during deployment.
Data labels that are not consistently annotated make it harder for machine learning models to learn from them. This happens when the domain ideologies and annotation strategies differ among multiple labelers and are just not defined properly. For example, “Regular” and “Normal” mean the same thing, but to the machine, it’s two completely different classes; so are “Normal” and “normal” with just a capitalization difference!
Practice formalizing a proper strategy for label annotation during the planning stage before carrying out the actual annotation process. Cleaning the data for simple consistency errors is possible post-data annotation, but some consistency errors can be hard to detect and complex to correct.
Machine learning thrives in building a decisioning mechanism that is robust to multiple variations and views of any specific label. Being capable and accomplishing it are two different things. One of the prerequisites of decisioning robustness is that the data that’s used for training and evaluation itself has to be comprehensive enough to provide coverage for all possible variations of each provided label. How can comprehensiveness be judged? Well, that depends on the complexity of the labels and how varied they can present themselves naturally when the model is deployed. More complex labels naturally require more samples and less complex labels require fewer samples.
A good point to start with, in the context of deep learning, is to have at least 100 samples for each label and experiment with building a model and deriving model insights to see if there are enough samples for the model to generalize on unseen variations of the label. When the model doesn’t produce convincing results, that’s when you need to cycle back to the data preparation stage again to acquire more data variations of any specific label. The machine learning life cycle is inherently a cyclical process where you will experiment, explore, and verify while transitioning between stages to obtain the answers you need to solve your problems, so don’t be afraid to execute these different stages cyclically.
While having complete and comprehensive data is beneficial to build a machine learning model that is robust to data variations, having duplicated versions of the same data variation in the acquired dataset risks creating a biased model. A biased model makes biased decisions that can be unethical and illegal and sometimes renders such decisions meaningless. Additionally, the amount of data acquired for any specific label is rendered meaningless when all of them are duplicated or very similar to each other.
Machine learning models are generally trained on a subset of the acquired data and then evaluated on other subsets of the data to verify the model’s performance on unseen data. When the part of the dataset that is not unique gets placed in the evaluation partition of the acquired dataset by chance, the model risks reporting scores that are biased against the duplicated data inputs.
Does the acquired dataset represent minority groups properly? Is the dataset biased toward the majority groups in the population? There can be many reasons why a machine learning model turns out to be biased, but one of the main causes is data representation bias. Making sure the data is represented fairly and equitably is an ethical responsibility of all machine learning practitioners. There are a lot of types of bias, so this topic will have its own section and will be introduced along with methods of mitigating it in Chapter 13, Exploring Bias and Fairness.
Are there outliers in the dataset? Is there missing data in the dataset? Did you accidentally add a blank audio or image file to the properly collected and annotated dataset? Is the annotated label for the data input considered a valid label? These are some of the questions you should ask when considering the validity of your dataset.
Invalid data is useless for machine learning models and some of these complicate the pre-processing required for them. The reasons for invalidity can range from simple human errors to complex domain knowledge mistakes. One of the methods to mitigate invalid data is to separate validated and unvalidated data. Include some form of automated or manual data validation process before a data sample gets included in the validated dataset category. Some of this validation logic can be derived from business processes or just common sense. For example, if we are taking age as input data, there are acceptable age ranges and there are age ranges that are just completely impossible, such as 1,000 years old. Having simple guardrails and verifying these values early when collecting them makes it possible to correct them then and there to get accurate and valid data. Otherwise, these data will likely be discarded when it comes to the model-building stage. Maintaining a structured framework to validate data ensures that the majority of the data stays relevant and usable by machine learning models and free from simple mistakes.
As for more complex invalidity, such as errors in the domain ideology, domain experts play a big part in making sure the data stays sane and logical. Always make sure you include domain experts when defining the data inputs and outputs in the discussion about how data should be collected and annotated for model development.