29,99 €
Discover why the MATLAB programming environment is highly favored by researchers and math experts for machine learning with this guide which is designed to enhance your proficiency in both machine learning and deep learning using MATLAB, paving the way for advanced applications.
By navigating the versatile machine learning tools in the MATLAB environment, you’ll learn how to seamlessly interact with the workspace. You’ll then move on to data cleansing, data mining, and analyzing various types of data in machine learning, and visualize data values on a graph. As you progress, you’ll explore various classification and regression techniques, skillfully applying them with MATLAB functions.
This book teaches you the essentials of neural networks, guiding you through data fitting, pattern recognition, and cluster analysis. You’ll also explore feature selection and extraction techniques for performance improvement through dimensionality reduction. Finally, you’ll leverage MATLAB tools for deep learning and managing convolutional neural networks.
By the end of the book, you’ll be able to put it all together by applying major machine learning algorithms in real-world scenarios.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 606
Veröffentlichungsjahr: 2024
MATLAB for Machine Learning
Unlock the power of deep learning for swift and enhanced results
Giuseppe Ciaburro
Copyright © 2024 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Group Product Manager: Niranjan Naikwadi
Publishing Product Manager: Tejashwini R
Book Project Manager: Kirti Pisat
Content Development Editor: Priyanka Soam
Technical Editor: Rahul Limbachiya
Copy Editor: Safis Editing
Proofreader: Safis Editing
Indexer: Manju Arasan
Production Designer: Prafulla Nikalje
DevRel Marketing Coordinator: Vinishka Kalra
First published: August 2017
Second edition: January 2024
Production reference: 1190124
Published by Packt Publishing Ltd.
Grosvenor House
11 St Paul’s Square
Birmingham
B3 1RB, UK
ISBN: 978-1-83508-769-5
www.packtpub.com
To the memory of my father, Luigi, and my mother, Caterina, for their sacrifices and for exemplifying the power of determination. To my sons, Luigi and Simone, the bright stars in my universe, my pillars of strength, and my source of inspiration – may this book be an example of passion and dedication to one’s work.
– Giuseppe Ciaburro
Giuseppe Ciaburro holds a PhD in environmental engineering physics and two master’s degrees in chemical engineering and in acoustics and noise control. He works at the Built Environment Control Laboratory at “Università degli studi della Campania Luigi Vanvitelli“ and has over 20 years of experience in programming, first in the field of combustion and then in acoustics and noise control. His core programming skills are in MATLAB, Python, and R. As an expert in acoustics and noise control, Giuseppe has extensive experience in teaching and researching. He is currently researching machine learning applications in acoustics and noise control. He has written for several publications, and for the last two years, he has been ranked by Stanford University as one of the top 2% of scientists in the world.
I want to thank the people who have been close to me and supported me in the writing of the book, especially the Packt team.
Dr. Sankar P is a seasoned academician and a dynamic researcher with a diverse background in electronics and communication engineering and information technology, working at Hindustan Institute of Technology and Science, Chennai, India. His academic journey spans over two decades, during which he has made significant contributions to the field through teaching, research, and administrative roles. His expertise as a reviewer for prestigious books and journals such as Pearson, Oxford, Elsevier, and IEEE was recognized with the Best Reviewer Award in 2017.
Dr. S. Mahesh Anand is the founder of Learn AI with Anand. Dr. Anand’s journey unfolded through education, innovation, and a passion for data science and AI. Starting his career at VIT University, Dr. Anand’s early years shaped a teaching philosophy beyond conventional boundaries. Transitioning to corporate training, he meticulously designed and delivered comprehensive learning programs tailored for aspiring data scientists. He was recognized as the Best Data Science and AI Educator by AI Global Media (UK) and by Corporate Vision Magazine in 2022 for his outstanding contributions to education and training. Early in 2000, he received the AT&T Labs Award from the IEEE Headquarters (USA) and the MV Chauhan Award from the IEEE India Council, adding honors to his professional journey.
MATLAB is a comprehensive programming environment used by many researchers and math experts for machine learning. This book will help you learn the basic concepts in machine learning and deep learning using MATLAB, and then refine your basic skills with advanced applications.
You’ll start by exploring the tools that the MATLAB environment offers for machine learning and see how to easily interact with the MATLAB workspace. We’ll then move on to data cleansing, mining, and analyzing various types of data in machine learning, and you’ll see how to visualize data values on a graph. Then, you’ll learn about the different types of classification and regression techniques and how to apply them to your data, using MATLAB functions. Further, you will understand the basic concepts of neural networks and perform data fitting, pattern recognition, and clustering analysis. You will also explore feature selection and extraction techniques for dimensionality reduction for performance improvement. Finally, you’ll learn how to leverage MATLAB tools for deep learning and managing convolutional neural networks.
By the end of the book, you’ll learn how to put it all together in real-world cases, covering major machine learning algorithms, and you’ll feel confident as you delve into machine learning with MATLAB.
This book is suitable for machine learning engineers, data scientists, deep learning engineers, and CV/NLP engineers who want to use MATLAB for machine learning and deep learning. You should have a fundamental understanding of programming concepts.
Chapter 1, Exploring MATLAB for Machine Learning, covers machine learning, which is a branch of artificial intelligence that is based on the development of algorithms and mathematical models, capable of “learning” from data and autonomously adapting to improve their performance according to the objectives set. Thanks to this learning ability, machine learning is used in a wide range of applications, such as data analysis, CV, text translation, speech recognition, medical diagnosis, and financial risk prediction. Machine learning is an ever-evolving area of research and is revolutionizing many fields of science and industry. The aim of this chapter is to provide some introduction, background information, and a basic knowledge of MATLAB tools. In addition, the basic concepts of machine learning will be introduced.
Chapter 2, Working with Data in MATLAB, looks at how to import and organize our data in MATLAB. Today, the amount of data generated is enormous; smartphones, credit cards, televisions, computers, home appliances, sensors, domestic systems, public and private transport, and so on are just a few examples of devices that generate data seamlessly. Such data is stored and then used for various purposes. One of these is data analysis using machine learning algorithms. To import and organize our data in MATLAB, you should familiarize yourself with the MATLAB workspace to make the operations as simple as possible. Then, we will analyze the different formats available for the data collected and how to move data in and out of MATLAB. We will also explore datatypes to work with grouping variables and categorical data and how to export data from the workspace, including cell array, structure array, and tabular data, and save it in a MATLAB-supported file format. Finally, we will understand how to organize data in the correct format for the next phase of data analysis.
Chapter 3, Prediction Using Classification and Regression, shows us how to classify an object using nearest neighbors and how to perform an accurate regression analysis in a MATLAB environment. Classification algorithms return accurate predictions based on our observations. Starting from a set of predefined class labels, the classifier assigns each input data a class label, according to the training model. Regression relates a set of independent variables to a dependent variable. Through this technique, it is possible to understand how the value of the dependent variable changes as the independent variable varies.
Chapter 4, Clustering Analysis and Dimensionality Reduction, explores clustering methods, which are designed to find hidden patterns or groupings in a dataset. These algorithms identify a grouping without any label to learn from through the selection of clusters, based on the similarity between the elements. Dimensionality reduction is the process of converting a set of data with many variables into data with lesser dimensions but ensuring similar information. Feature selection approaches try to find a subset of the original variables. Feature extraction reduces the dimensionality of the data by transforming it into new features. This chapter shows us how to divide the data into clusters, or groupings of similar items. We’ll also learn how to select a feature that best represents the dataset.
Chapter 5, Introducing Artificial Neural Networks Modeling, delves into artificial neural networks (ANNs), which include data structures and algorithms for the learning and classification of data. Through the neural network techniques, a program can learn by example and create an internal structure of rules to classify different inputs. MATLAB provides algorithms, pretrained models, and apps to create, train, visualize, and simulate ANNs. In this chapter, we will see how to use MATLAB to build an ANN-based model to predict values and classify data.
Chapter 6, Deep Learning and Convolutional Neural Networks, examines deep learning, which is a machine learning technology based on multilayer ANNs and has allowed many applications to reach a high degree of accuracy. Deep neural networks are capable of modeling complex relationships between input and output data. Among the most successful applications is computer vision, with tasks that include classification, image regression, and object detection. For example, a deep neural network is able to generate a layered representation of objects in which each object is identified by a set of characteristics that has the form of visual primitives, such as particular edges, oriented lines, textures, and recurring patterns. Convolutional networks are characterized by convolutional layers, which use filters to analyze data in a local region and produce an activation map. These activation maps are then processed by pooling layers, which aggregate the low-resolution data to reduce the dimensionality of the representation and make processing more computationally efficient. The convolutional and pooling layers are then alternated several times until an image is represented by a low-resolution activation map. In this chapter, we will learn the basic concepts of deep learning and discover how to implement an algorithm based on convolutional networks in the MATLAB environment.
Chapter 7, Natural Language Processing Using MATLAB, explores natural language processing (NLP), which automatically processes information conveyed through spoken or written language. This task is fraught with difficulty and complexity, largely due to the innate ambiguity of human language. To enable machine learning and interaction with the world in ways typical of humans, it is essential not only to store data but also to teach machines how to translate it simultaneously into meaningful concepts. As natural language interacts with the environment, it generates predictive knowledge. In this chapter, we will learn the basic concepts of NLP and how to build a model to label sentences.
Chapter 8, MATLAB for Image Processing and Computer Vision, covers computer vision, which is a field that studies how to process, analyze, and understand the contents of visual data. In image content analysis, we use a lot of computer vision algorithms to build our understanding of the objects in an image. Computer vision covers various aspects of image analysis, such as object recognition, shape analysis, pose estimation, 3D modeling, and visual search. Humans are good at identifying and recognizing things around them! The goal of computer vision is to accurately model the human vision system using computers. In this chapter, we will understand the basic concepts of computer vision and how to implement a model for object recognition, using MATLAB.
Chapter 9, Time Series Analysis and Forecasting with MATLAB, delves into time series data, which is basically a sequence of measurements that are collected over time. These measurements are taken with respect to a predetermined variable and at regular time intervals. One of the main characteristics of time series data is that the ordering matters. The list of observations that we collect is ordered on a timeline, and the order in which they appear says a lot about underlying patterns. If you change the order, this will totally change the meaning of the data. Sequential data is a generalized notion that encompasses any data that comes in a sequential form, including time series data. In this chapter, we will learn the basic concepts of sequential data and how to build a model that describes the pattern of the time series or any sequence in general.
Chapter 10, MATLAB Tools for Recommender Systems, examines the recommendation engine, which is a model that can predict what a user may be interested in. When we apply this to the context of movies, for example, this becomes a movie recommendation engine. We filter items in our database by predicting how the current user might rate them. This helps us in connecting the user to the right content in our dataset. Why is this relevant? If you have a massive catalog, then the user may or may not find all the content that is relevant to them. By recommending the right content, you increase consumption. Companies such as Netflix heavily rely on recommendations to keep the user engaged. In this chapter, we will learn the basic concepts of recommender systems and how to build a movie recommendations system, using MATLAB.
Chapter 11, Anomaly Detection in MATLAB, teaches you the basic concepts of an anomaly detection system and how to implement one in MATLAB. A physical system, in its life cycle, can be subject to failures or malfunctions that can compromise its normal operation. It is, therefore, necessary to introduce an anomaly detection system within the capability of preventing critical interruptions. This is called a fault diagnosis system and can identify the possible presence of a malfunction within the monitored system. The search for the fault is one of the most important and qualifying maintenance intervention phases, and it is necessary to act in a systematic and deterministic way. To carry out a complete search for the fault, it is necessary to analyze all the possible causes that may have determined it.
In this book, machine learning algorithms are implemented in the MATLAB environment. So, to reproduce the many examples in this book, you need a new version of MATLAB (R2023b is recommended) and the following toolboxes – a statistics and machine learning toolbox, a neural network toolbox, a deep learning toolbox, and a fuzzy logic toolbox.
Software/hardware covered in the book
Operating system requirements
MATLAB
Windows, macOS, or Linux
If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.
You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/MATLAB-for-Machine-Learning-second-edition. If there’s an update to the code, it will be updated in the GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
There are a number of text conventions used throughout this book.
Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: “To create a classification tree, we can utilize the fitctree() function.”
A block of code is set as follows:
gscatter(meas(:,3), meas(:,4), species,'rgb','osd'); xlabel('Petal length'); ylabel('Petal width');Bold: Indicates a new term, an important word, or words that you see on screen. For instance, words in menus or dialog boxes appear in bold. Here is an example: “Now, we can train the network just by clicking on the train button of the app. After a few seconds, the ANN will be trained and ready for use.”
Tips or important notes
Appear like this.
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, email us at [email protected] and mention the book title in the subject of your message.
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.
Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Once you’ve read MATLAB for Machine Learning, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.
Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.
Thanks for purchasing this book!
Do you like to read on the go but are unable to carry your print books everywhere?
Is your eBook purchase not compatible with the device of your choice?
Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.
Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.
The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily
Follow these simple steps to get the benefits:
Scan the QR code or visit the link belowhttps://packt.link/free-ebook/9781835087695
Submit your proof of purchaseThat’s it! We’ll send your free PDF and other benefits to your email directlyThis part provides background information and essential knowledge about MATLAB tools, along with an introduction to basic machine learning concepts. We also will focus on importing and organizing data in MATLAB, emphasizing familiarity with the MATLAB workspace for simplicity in operations. The discussion covers analyzing various data formats, moving data in and out of MATLAB, exploring datatypes for grouping variables and categorical data, exporting data in different formats such as cell arrays, structure arrays, and tabular data, and saving it in MATLAB-supported file formats. The ultimate goal is to prepare data in the right format for the subsequent phase of data analysis.
This part has the following chapters:
Chapter 1, Exploring MATLAB for Machine LearningChapter 2, Working with Data in MATLABMachine learning (ML) is a branch of artificial intelligence that is based on the development of algorithms and mathematical models capable of learning from data and autonomously adapting to improve their performance according to a set of objectives. Thanks to this learning ability, ML is used in a wide range of applications, such as data analysis, computer vision, language modeling, speech recognition, medical diagnosis, and financial risk prediction. ML is an ever-evolving area of research and is revolutionizing many fields of science and industry. The aim of this chapter is to provide you with an introduction, background information, and a basic knowledge of ML, as well as an understanding of how to apply these concepts using MATLAB tools.
In this chapter, we’re going to cover the following main topics:
Introducing MLDiscovering the different types of learning processesUsing ML techniquesExploring MATLAB toolboxes for MLML applications in real lifeIn this chapter, we will introduce basic concepts relating to ML. To understand these topics, a basic knowledge of algebra and mathematical modeling is needed. A working knowledge of the MATLAB environment is also required.
ML is based on the idea of providing computers with a large amount of input data, together with the corresponding correct answers or labels, and allowing them to learn from this data, identifying patterns, relationships, and regularities within them. Unlike traditional programming approaches, in which computers follow precise instructions to perform specific tasks, ML allows machines to independently learn from data and make decisions based on statistical models and predictions.
One of the key concepts of ML is the ability to generalize. This means that a model trained on information in the training dataset should be able to make accurate predictions about new data that it has never seen before. This allows ML to be applied across a wide range of domains.
To better understand the basic concepts of ML, we can start from the definitions formulated by the pioneers in this field. According to Arthur L. Samuel (1959) – “ML is a field of study that gives computers the ability to learn without being explicitly programmed.”
The definition mentioned refers to the ability to learn from experience, which humans do in most cases.
ML is an interdisciplinary domain forged at the crossroads and harmonious blending of computer science, statistics, neurobiology, and control theory. Its practical applications have successfully surmounted various challenges across diverse fields, fundamentally altering the paradigm of software development. In essence, ML constitutes a foundational approach that empowers computers with a degree of autonomy. It is evident that ML draws its inspiration from the study of human learning; just as the human brain and its neurons underlie intuition, artificial neural networks serve as the bedrock for computer decision-making. ML facilitates the creation of models that can accurately depict data patterns through a meticulous analysis of datasets.
As an illustration, we can establish a connection between input variables and output variables within a given system. One approach to achieving this is by assuming the existence of a mechanism for generating parametric data, albeit without precise knowledge of the parameter values. This procedure is commonly referred to as employing statistical methods.
Logical reasoning is based on the concepts of induction, deduction, and inference. Here are the differencesbetween them:
Induction is a reasoning process that starts from specific observations or data to arrive at general conclusions. In other words, it is about extracting a general rule or principle based on specific examples. Induction can be useful for making predictions and generalizations, but the conclusions obtained are not necessarily certain.Deduction is a reasoning process that starts from general premises or rules and arrives at specific conclusions. In other words, it is about applying general principles to obtain specific information. The deduction is based on formal logic and the use of valid inference rules. The deduction produces logically correct conclusions from the given premises.Finally, inference is a reasoning process that leads us to draw conclusions, deductions, or judgments based on available evidence or information. Inference can involve both induction and deduction as well as other forms of reasoning, such as abduction. Inference can be viewed as the application of reasoning rules or strategies to obtain new information or arrive at conclusions based on existing ones.In summary, induction is based on extracting general principles from specific examples, deduction is based on applying general principles to obtain specific conclusions, while inference is a broader term that encompasses both induction and deduction, as well as other reasoning processes.
In the realm of ML, the teacher and the learner are two pivotal entities that play significant roles. The teacher possesses the necessary knowledge to execute a given task, while the learner’s objective is to acquire that knowledge to perform the task. The strategies employed for learning can be distinguished based on the level of inference carried out by the learner, considering two extremes: no inference and substantial inference.
When a computer system (the learner) is directly programmed, it acquires knowledge without engaging in any inference, as all cognitive processes are handled by the programmer (the teacher). Quite the opposite, when a system finds new solutions autonomously, it necessitates a substantial amount of inference. In this scenario, organized knowledge is derived through experiments and observations. In the following figure, the difference between induction and deduction for inference tasks is shown.
Figure 1.1 – Logical reasoning keys
In between induction and deduction is a midpoint called inference – for example, say a student seeks to work out a math challenge by drawing analogies to solutions provided in a workbook. This activity demands inference, albeit to a lesser extent than the discovery of a new mathematical theorem.
By increasing the learner’s capacity for inference, the drain on the teacher diminishes. The following taxonomy of ML endeavors to portray the concept of trade-offs in terms of work demanded by both the learner and the teacher.
Based on the inference performed, we can identify different mechanisms that a learning system can adopt. Let’s see some of them:
Rote learning: Let’s start with a basic one, rote learning, which involves the use of content without any transformation being carried out on it. No inference process is activated, and the knowledge is not processed in any way; it is essentially a memorization process programmed with considerable effort on the part of the learner.Learning from instruction: When acquiring knowledge from a teacher, the learner needs to convert the information presented into a format that can be internally processed. Additionally, it is crucial for the learner to integrate the new knowledge with their existing understanding to effectively utilize it. This process involves some level of inference on the learner’s part, but a significant portion of the responsibility lies with the teacher. The teacher must present and organize the knowledge in a manner that progressively enhances the student’s existing knowledge. This aligns with the approach used in most formal education methods. Consequently, the task of ML involves developing a system that can receive instruction or advice, store it, and apply the acquired knowledge effectively.Figure 1.2 – Learning strategy typologies based on the inference performed
Learning by analogy: When acquiring new statements or skills, the process involves converting and enhancing existing knowledge that shares a strong resemblance to the new setting. This allows the transformed knowledge to be effectively applied in the new context. For example, a learning-by-analogy system could be utilized to transform code into something that implements a closely related function, even if it wasn’t originally designed for that purpose. Learning by analogy needs more inference on the part of the learner compared to the previous learning mechanism. The learner must retrieve a relevant fact or skill from their memory that is analogous in terms of relevant parameters. Then, the retrieved knowledge needs to be converted, related to the new context, and archived for future use.Learning from examples: When confronted with a set of instances, including both examples that support an idea and counterexamples that contradict it, the learner infers a generalized description of the idea. This description encompasses all the positive examples while systematically removing the negative ones. Learning from examples has been extensively studied in the field of artificial intelligence. In this method, the learner engages in a higher degree of inference compared to learning from instruction, as there are no general ideas provided by a teacher. It also involves a slightly greater level of inference than learning by analogy since there are no related ideas provided as starting points for developing the new concept.Learning from observation: This form of inductive learning is highly versatile and encompasses various tasks, such as discovery systems, theory formation, and establishing grouping principles for creating taxonomic hierarchies. Unlike other approaches discussed, this form of learning does not rely on an external teacher. Instead, the learner is required to engage in extensive inference. They are not presented with a specific set of examples for a specific idea, nor do they have access to a prediction that can group instances produced in an automated way as positive or negative examples of any given concept. Furthermore, instead of focusing on a single idea, the learner must handle multiple concepts simultaneously, which introduces a significant challenge in terms of attention allocation.Learning by chunking: This is a cognitive strategy that involves breaking down information into smaller, manageable units or “chunks” to enhance memory and comprehension. This technique capitalizes on the brain’s natural tendency to organize and process information in meaningful clusters. By grouping related concepts together, learners can more easily absorb and retain complex material. Chunking is particularly effective in various educational settings, from memorizing lists and sequences to mastering intricate subjects. When information is organized into cohesive chunks, it becomes easier to grasp the relationships between different components, facilitating a deeper understanding of the overall content. This approach is especially beneficial in fields such as language acquisition, where breaking down phrases or words into manageable chunks aids in faster and more efficient learning. Moreover, chunking promotes efficient recall and problem-solving. Instead of struggling to remember individual pieces of information, learners can access broader chunks of knowledge, leading to quicker and more accurate retrieval. This cognitive strategy aligns with the brain’s capacity to process information in parallel, optimizing the learning process and enhancing overall cognitive performance. In essence, learning by chunking empowers individuals to navigate the complexities of information with greater ease and effectiveness.Analyzing the learning mechanisms that a system can adopt will help us better understand the different types of learning processes that can be adopted in an ML-based system.
Learning is based on the idea that perceptions should not only guide actions but also enhance the agent’s ability to automatically learn from interactions with the world and the decision-making processes themselves. A system is considered capable of learning when it has an executive component for making decisions and a learning component for modifying the executive component to improve decisions. Learning is influenced by the components learned from the system, by the feedback received after the actions are performed, and by the type of representation used.
ML offers several ways of allowing algorithms to learn from data, which are classified into categories based on the type of feedback on which the learning system is based. Choosing which learning category to use for a specific problem must be done in advance to find the best solution. It is useful to evaluate the robustness of the algorithm, such as its ability to make correct predictions even with missing data, the scalability, the efficiency of the algorithm with small or large datasets, and the interpretability, which refers to the possibility of attributing a more subjective than objective outcome.
Figure 1.3 – Different types of learning processes
ML algorithms can be categorized according to the type of experience they are subjected to during the learning process. It is common to categorize the paradigms of ML as follows:
Supervised learning: The algorithm generates a function that links input values to a desired output, through the observation of a set of examples in which each data input has its relative output data that is used to construct predictive models.Unsupervised learning: The algorithm tries to derive knowledge from a general input, without the help of a set of pre-classified examples, which is used to build descriptive models. A typical example of the application of these algorithms is search engines.Reinforcement learning: The algorithm can learn depending on the changes that occur in the environment in which it is performed. The agent receives feedback in the form of rewards or punishments based on the actions it takes, allowing it to learn optimal strategies over time. In fact, since every action has some effect in the environment concerned, the algorithm is driven by the same feedback environment. Some of these algorithms are used in self-driving cars and autonomous AlphaGo gaming.Let’s see these categories in detail, analyzing their characteristics and trying to understand how to choose the most suitable paradigm for our problem.
Supervised learning is an ML technique designed to enable a computer system to automatically solve tasks. The process involves providing input data, typically in the form of vectors, which form a set, I. The output data is defined as a set, O, and a function, f, is established to associate each input with the correct answer. This set of data used for training is referred to as the training set.
Of(I)
The underlying principle of all supervised learning algorithms is that with enough examples, an algorithm can create a derived function, B, that approximates the desired function, A. If the approximation is accurate enough, the derived function should provide output answers like those of the desired function, making them acceptable. These algorithms rely on the assumption that similar inputs correspond to similar outputs. This assumption is often not perfectly met. However, there are situations where this approximation is acceptable.
With this type of learning, it is possible, for example, to use correctly labeled images of animals as input, to make the algorithm learn the correlation between the characteristics of a specific animal and the image containing it. This is done in such a way that, subsequently, the algorithm can recognize it, given the input of an image containing that type of animal. The algorithm learns to recognize a certain pattern to understand the correlation between the image and the assigned label, to then deduce a general rule with which to recognize whether the same correlation exists in the subsequent data.
To use supervised learning, it is therefore necessary to have data with known outputs. In the case of fully observable environments, the system can see what effects its actions have and can use the supervised learning method to learn to predict them. If the environment were to be not completely observable, the immediate effects could be invisible. The performance of these algorithms is highly dependent on the input data. If only a few training inputs are provided, the algorithm may lack sufficient experience to produce correct outputs. On the other hand, an excessive number of inputs can make the algorithm excessively slow, as the derived function becomes more complex.
Furthermore, supervised learning algorithms are sensitive to noise. Even a small number of incorrect data points can render the entire system unreliable, leading to erroneous decisions.
When the output value is categorical, such as membership/non-membership in a certain class, it is considered a classification problem. On the other hand, if the output is a continuous real value within a certain range, it is classified as a regression problem.
The objective of unsupervised learning is to automatically extract information from databases without prior knowledge of the content to be analyzed. Unlike supervised learning, there is no information available regarding membership classes or the output corresponding to a given input. The goal is to obtain a model capable of discovering interesting properties, such as groups (clusters) of examples with similar characteristics (clustering). An example of the application of these algorithms is seen in search engines. By providing one or more keywords, they can generate a list of links related to our search. The differences between supervised learning and unsupervised learning are highlighted in the following figure.
Figure 1.4 – Supervised learning versus unsupervised learning
The objective of unsupervised learning is to autonomously learn the underlying structure of the input data. The outcomes are influenced by decisions regarding which data to input and the order in which they are presented. This approach reorganizes the data in a different manner, creating more meaningful data clusters for subsequent analyses and enabling the discovery of previously unnoticed patterns in the data. A system capable of pure unsupervised learning lacks the ability to learn what actions to take, as it lacks information about what constitutes a correct action or a desirable state to achieve.
Unsupervised learning is commonly employed in clustering problems, where it identifies groups of data based on shared characteristics, resulting in the creation of data clusters. Additionally, this approach is utilized in association learning, which involves identifying associative rules among the input data and in recommendation systems.
The effectiveness of these algorithms hinges on the value of the information they can derive from databases. These algorithms function by scrutinizing data and actively seeking out resemblances or disparities among them. The data at hand pertains exclusively to the array of attributes that characterize each individual example.
Unsupervised learning demonstrates high efficiency when working with numeric elements, but their accuracy decreases when dealing with non-numeric data. Typically, these algorithms perform well when the data exhibits a clear order or distinct grouping that can be easily identified.
Reinforcement learning aims to develop algorithms that can learn and adapt to changes in their environment. This programming technique is based on the concept of receiving external stimuli based on the choices made by the algorithm. Making a correct choice results in a reward, while an incorrect choice leads to a penalty. The ultimate objective of the system is to achieve the best possible outcome.
Unlike supervised learning, where a teacher provides the system with correct outputs (learning with guidance), this is not always feasible. Often, only qualitative information is available, sometimes in a binary form (such as right/wrong or success/failure). These available pieces of information are referred to as reinforcement signals. However, the system does not provide any guidance on how to update the agent as an optimization algorithm and the environment as being characterized by the family of objective functions that we’d like to learn an optimizer for. The goal of the system is to create “intelligent” agents that possess the ability to learn from their experiences.
Reinforcement learning algorithms operate based on rewards that are assigned according to achieved objectives. These rewards are essential for understanding what actions are desirable and should be pursued, as well as what actions should be avoided. Without rewards, the algorithm would struggle to make decisions. Reinforcement learning is employed when algorithms need to make decisions that have consequences. It goes beyond being purely descriptive and becomes prescriptive, providing guidance on what actions to take. This type of learning is highly innovative in the field of ML and is well suited for understanding how environments work. The flow of the reinforcement learning algorithm is shown in the following figure.
Figure 1.5 – Reinforcement learning mechanism
A crucial aspect is the algorithm’s ability to interpret feedback as reinforcement rather than new input. The goal of reinforcement learning is to utilize received feedback to construct an optimal policy, maximizing the total expected reward. A particular policy represents a particular update formula. Hence, learning the policy is equivalent to learning the update formula, and hence the optimization algorithm. This approach allows training an algorithm without explicitly defining rules deduced from feedback. It is particularly useful in complex domains where defining rules can be challenging, and performance can be greatly improved.
Reinforcement learning encompasses both passive learning and active learning. In passive learning, the agent follows a fixed policy and aims to learn the utilities of state-action pairs. In some cases, the agent may also need to learn the model of the environment. In active learning, the agent must learn what actions to take and, through exploration, gain experience on how to behave within the environment. One way to implement this strategy is by having the algorithm play a game without providing it with the rules. Positive feedback is given when the agent takes allowed or beneficial actions, while negative feedback is provided for undesirable actions. This allows the algorithm to learn optimal moves during the game without explicit knowledge of the game rules. The agent learns to behave successfully in its environment solely based on the received feedback and without prior knowledge.
Alongside the three pillars of the learning paradigms just seen (supervised learning, unsupervised learning, and reinforcement learning) are some new typologies derived from these approaches. Let’s see some of them.
Semi-supervised learning combines the strengths of both supervised and unsupervised learning. Initially, the supervised approach is employed by providing inputs along with their corresponding outputs. Then, additional similar inputs are introduced without their associated output references. These inserted inputs and outputs contribute to the creation of a general model, which can be utilized to extrapolate outputs for the remaining inputs.
The process begins with the supervised phase, where the labeled data is used to train a model. This model learns to map the inputs to their corresponding outputs based on the provided labels. Once the initial model is trained, it can be used to make predictions on the unlabeled data.
During the unsupervised phase, the model uses the unlabeled data to extract patterns, structures, or relationships within the data. This can be done through techniques such as clustering, dimensionality reduction, or generative modeling. By leveraging the unlabeled data, the model can gain a better understanding of the underlying distribution of the data and potentially improve its performance.
The goal of semi-supervised learning is to use the knowledge gained from the labeled and unlabeled data to create a more robust and accurate model. By combining the labeled data’s explicit guidance with the unsupervised learning’s ability to capture hidden patterns, semi-supervised learning can be a powerful approach, especially in scenarios where obtaining labeled data is expensive or time-consuming.
Transfer learning offers the ability to transfer the acquired knowledge from addressing one problem to effectively tackling a similar problem. The significant advantage of reusing knowledge is evident, although it may not always be feasible due to the need to adapt many ML algorithms to the specific case at hand.
Transfer learning involves leveraging knowledge gained from solving one problem and applying it to a different but related problem. In transfer learning, a pre-trained model that has been trained on a large dataset is used as a starting point for solving a new task or problem.
The basic idea behind transfer learning is that the features learned by a model on one task can be useful for another task. Instead of starting the learning process from scratch, transfer learning allows us to transfer the knowledge and representations acquired by the pre-trained model to the new task. Using such an approach can lead to a substantial reduction in the required amount of training data and computational resources for the new task.
There are typically two main approaches to transfer learning:
Fine-tuning: In this approach, the pre-trained model is taken and further trained on the new task with a smaller dataset specific to the new task. The idea is to adjust the parameters of the pre-trained model to make it more relevant to the new problem while retaining the knowledge it has already learned.Feature extraction: With this approach, the pre-trained model is used as a fixed feature extractor available for deep learning models. The earlier layers of the pre-trained model are frozen, and only the later layers are replaced or modified to adapt to the new task. The new data is passed through the pre-trained model, and the output features from the last few layers are extracted and used as input for a new classifier or model specific to the new task.Transfer learning can be particularly effective when the pre-trained model has been trained on a large and diverse dataset, as it tends to learn generic features that are useful for a wide range of tasks. By leveraging pre-existing knowledge, transfer learning enables faster training, better performance, and improved generalization on new tasks, especially when labeled data for the new task is limited or unavailable.
After introducing the different types of learning paradigms, we can move on to analyzing how to approach a problem using techniques based on ML.
In the previous section, we explored the various types of ML paradigms in detail. So, we have understood the basic principles that underlie the different approaches. At this point, it is necessary to understand what the elements that allow us to discriminate between the different approaches are; in other words, in this section, we will understand how to adequately choose the learning approach necessary to obtain our results.
Selecting the appropriate ML algorithm can feel overwhelming given the numerous options available, including both supervised and unsupervised approaches, each employing different learning strategies.
There is no universally superior method, nor one that fits all situations. In large part, the search for the right algorithm involves trial and error; even seasoned data scientists cannot determine whether an algorithm will work without testing it. Nonetheless, the algorithm choice is also influenced by factors such as the data format and type, the desired information to be extracted, and how that information will be utilized.
Here are some guidelines to assist in selecting the most suitable approach:
Opt for supervised learning when the objective is to train a model for making predictions, such as determining future values of a continuous variable such as barometric pressure or stock prices, or performing classification tasks such as identifying vehicle types from webcam footage.Choose unsupervised learning when the aim is to analyze data and develop a model that discovers meaningful internal representations, such as clustering the data into distinct groups.Select reinforcement learning when the objective is to achieve a goal within an uncertain environment where all variables cannot be predicted. This approach is valuable when there are multiple ways to accomplish a task, but certain rules need to be followed. An example is autonomous driving, where adherence to traffic regulations is essential.By considering these factors and aligning them with the specific goals and characteristics of the data, one can make a more informed choice when selecting an appropriate ML paradigm.
The selection of the learning paradigm and the specific algorithm is clearly dependent on the characteristics of the data we are working with. This includes factors such as data size, quality, and nature, as well as the desired outcome and the implementation details of the algorithm.
There is no universally superior method or one-size-fits-all solution. The only way to ascertain the suitability of an algorithm is to experiment and evaluate its performance using appropriate metrics.
However, we can conduct a preliminary analysis to better understand the approach that aligns with our requirements. We start by considering what we have (the data), the available tools (algorithms), and the goals we aim to achieve (results). Through this analysis, we can gather valuable information to guide our decision-making process. The following figure shows the different learning paradigms and the activities that can be addressed.
Figure 1.6 – ML algorithm classification
Let’s begin with the data and classify its characteristics. This classification helps us determine the following options:
Classification based on input: If we have labeled input data, it indicates a supervised learning problem. If labeling is not available but we seek to uncover the system’s structure, it indicates an unsupervised learning problem. Finally, if our goal is to optimize an objective function through interactions with the environment, it indicates a reinforcement learning problem.Classification based on output: If the model’s output is a numerical value, it suggests a regression problem. If the output is categorical, it indicates a classification problem. If the output involves grouping the input data, it indicates a clustering problem.Once we have classified the problem, we can explore the available tools to solve it. This involves identifying applicable algorithms and focusing our study on the methods required to implement these tools for our specific problem.
After identifying the tools, we need to evaluate their performance. This can be accomplished by applying the selected algorithms to our datasets. By carefully selecting evaluation metrics, we can compare the performance of each algorithm.
By following this process of data analysis, problem classification, tool identification, and performance evaluation, we can make informed decisions and choose the most suitable algorithm for our needs.
Once the algorithm being applied to our data has been chosen, it is essential to establish a well-defined workflow before diving into the task at hand. Before embarking on the actual implementation, it is crucial to allocate some time to set up the workflow. When developing an ML-based application, this procedure typically involves the following steps:
Define the problem: Clearly articulate the problem you want to solve with ML. Determine the specific task, such as classification, regression, or clustering, and understand the objectives and requirements.Collect and preprocess data: Collect relevant data for your problem. Ensure the data is of high quality, representative, and covers a wide range of scenarios. Preprocess the data by handling missing values and outliers and performing data normalization or feature scaling and data cleaning. The foundation of any data-driven process lies in the data itself, and it is natural to question the origin of such vast amounts of data. Data collection involves a variety of methods, often through extensive procedures such as measurement campaigns or face-to-face interviews. Regardless of the specific method employed, the collected data is typically stored in a database, ready to be analyzed and transformed into valuable insights and knowledge.Split the data: Data splitting is a crucial step in ML to effectively evaluate and validate the performance of models. It involves dividing the available data into distinct subsets for training, validation, and testing purposes. A largely used data splitting technique, train-validation-test split, allows the division of data into training, validation, and testing sets as follows:Training set: The largest portion of the data used for model training. It is utilized to optimize the model’s parameters and learn patterns from the data.Validation set: A smaller subset of the data used to fine-tune the model’s hyperparameters and assess its performance during development. It helps in avoiding overfitting and selecting the best-performing model.Test set: A separate portion of the data used as a final evaluation measure to assess the model’s generalization and performance on unseen examples. It provides an unbiased estimate of the model’s capabilities in real-world scenarios. The following flow chart shows the entire workflow of an ML model.Figure 1.7 – ML model implementation workflow
Feature engineering: Extract or create meaningful features from the available data that can help the model learn patterns and make accurate predictions. This may involve transforming or combining existing features, encoding categorical variables, or generating new features based on domain knowledge. Essentially, there are two main approaches: feature extraction and selection. Feature extraction involves transforming raw data into a set of meaningful features that can effectively represent the underlying patterns and relationships in the data. It aims to capture the most relevant information and discard irrelevant or redundant data. This process can be particularly useful when dealing with high-dimensional or unstructured data. Feature selection involves selecting a subset of the most informative and relevant features from the available set of features. It helps in reducing dimensionality, improving model interpretability, and avoiding overfitting.Select a model: Choose an appropriate ML model based on the type of problem, data characteristics, and available resources. We have seen which considerations it is necessary to make for the choice of the learning paradigm: An adequate analysis is also necessary for the choice of the type of algorithm to adopt. In fact, there is not just one way to go; there are many, and each solution has strengths and weaknesses. Consider algorithms such as decision trees, helper vector machines, neural networks, or ensemble methods. Research and compare different models to identify the most suitable one, that is, the one that best fits the available data and that best allows us to make an adequate inference and extract adequate knowledge.Train the model: Now, the process becomes more substantial. At this stage, ML takes center stage as we define the model and proceed with training. The selected model begins the task of extracting valuable insights from the extensive volume of available data, uncovering new knowledge that was previously unknown to us. This marks the beginning of the learning journey, where the model dives deep into the data to uncover patterns, relationships, and hidden information that has yet to be revealed. Train the selected model using the training data. This involves feeding the input features and their corresponding target values to the model and adjusting its internal parameters through an optimization algorithm. The model learns from the training data to make accurate predictions.Validate and tune the model: In this phase, we use the knowledge gained from the previous step of the test to determine the effectiveness of the model. Evaluating an algorithm involves assessing how closely the model approximates the real-world system. In supervised learning, we have known values that allow us to evaluate the algorithm’s performance. In unsupervised learning, alternative metrics are employed to gauge success. If the results are unsatisfactory, we can revisit the preceding steps, make necessary adjustments, and retest the model until we achieve the desired outcome. This iterative process allows for refinement and improvement as we strive to develop a more accurate and reliable model. Evaluate the model’s performance on the validation set. Use suitable evaluation metrics for your specific problem, such as accuracy, precision, recall, or mean squared error. Fine-tune the model by adjusting hyperparameters, such as learning rate, regularization, or number of layers, to optimize its performance.Evaluate the model: Now, we have reached the critical juncture where we can put our work into action. It’s time to apply the model we have built to real-world data and evaluate its ability to approximate the desired outcomes. The model, having undergone thorough training and testing, is now assessed and valued in this phase. By deploying the model on real data, we can observe its performance, analyze its predictions, and assess how well it aligns with our expectations. This stage allows us to gauge the practicality and effectiveness of the model in real-world scenarios, providing valuable insights for further improvements and optimizations. Once the model is tuned, assess its performance on the testing set, as this offers an impartial estimation of the model’s performance on fresh, unseen data. Evaluate the model’s metrics and analyze its strengths and weaknesses. Consider additional techniques such as cross-validation for a more robust evaluation.Deploy the model: If the model meets the desired performance criteria, deploy it for real-world use. This involves integrating the model into your application or system and ensuring its compatibility with the production environment. Implement mechanisms to monitor and maintain the model’s performance over time. The ultimate objective of building an ML application is to solve a problem, and this can only be achieved when the ML model is actively utilized in a production setting. Consequently, ML model deployment holds equal importance to the development phase. Deployment entails transitioning a trained ML model from an offline environment to integration within an existing production system, such as a live application. It represents a critical stage that must be completed to enable the model to fulfill its intended purpose and address the challenges it was designed for. Deployments establish an online learning framework where the model is continuously updated with new data, enabling ongoing improvement. The precise process of deploying an ML model varies depending on factors such as the system environment, model type, and organizational DevOps practices. Nonetheless, the general deployment process, particularly in a containerized environment, can be summarized into four key steps, which will be elaborated upon later.Iterate and improve