29,99 €
This book introduces data science to professionals in engineering, physics, mathematics, and related fields. It serves as a workbook with MATLAB code, linking subject knowledge to data science, machine learning, and analytics, with applications in IoT. Part One integrates machine learning, systems theory, linear algebra, digital signal processing, and probability theory. Part Two develops a nonlinear, time-varying machine learning solution for modeling real-life business problems.
Understanding data science is crucial for modern applications, particularly in IoT. This book presents a dynamic machine learning solution to handle these complexities. Topics include machine learning, systems theory, linear algebra, digital signal processing, probability theory, state-space formulation, Bayesian estimation, Kalman filter, causality, and digital twins.
The journey begins with data science and machine learning, covering systems theory and linear algebra. Advanced concepts like the Kalman filter and Bayesian estimation lead to developing a dynamic machine learning model. The book ends with practical applications using digital twins.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 190
Veröffentlichungsjahr: 2024
A Systems Analytics Approach
P. G. Madhavan, Ph.D.
Copyright ©2022 by MERCURY LEARNING AND INFORMATION LLC. All rights reserved.
This publication, portions of it, or any accompanying software may not be reproduced in any way, stored in a retrieval system of any type, or transmitted by any means, media, electronic display or mechanical display, including, but not limited to, photocopy, recording, Internet postings, or scanning, without prior permission in writing from the publisher.
Publisher: David Pallai
MERCURY LEARNINGAND INFORMATION
22841 Quicksilver Drive
Dulles, VA 20166
www.merclearning.com
1-800-232-0223
P. G. Madhavan. Data Science for IoT Engineers.
ISBN: 978-1-68392-642-9
The publisher recognizes and respects all marks used by companies, manufacturers, and developers as a means to distinguish their products. All brand names and product names mentioned in this book are trademarks or service marks of their respective companies. Any omission or misuse (of any kind) of service marks or trademarks, etc. is not an attempt to infringe on the property of others.
Library of Congress Control Number: 2021942159212223321 Printed on acid-free paper in the United States of America.
Our titles are available for adoption, license, or bulk purchase by institutions, corporations, etc.For additional information, please contact the Customer Service Dept. at 800-232-0223(toll free).
All of our titles are available in digital format at academiccourseware.com and other digital vendors.The sole obligation of Mercury Learning and Information to the purchaser is to replace the book, based on defective materials or faulty workmanship, but not based on the operation or functionality of the product.
To my wife, Ann
Preface
About the Authors
Part 1 Machine Learning from Multiple Perspectives
Chapter 1 Overview of Data Science
Canonical Business Problem
A Basic ML Solution
Systems Analytics
Digital Twins
References
Chapter 2 Introduction to Machine Learning
Basic Machine Learning
Normalization
Data Exploration
Parallel Coordinate Systems
Feature Extraction
Multiple Linear Regression
Decision Tree
Naïve Bayes
Ensemble Method
Unsupervised Learning
K-Means Clustering
Self-Organizing Map (SOM) Clustering
Conclusion
Chapter 3 Systems Theory, Linear Algebra, and Analytics Basics
Digital Signal Processing (DSP) Machine Learning (ML)
Linear Time Invariant (LTI) System
Linear Algebra
Conclusion
Chapter 4 “Modern” Machine Learning
ML Formalism
Bayes
Generalization, the Hoeffding Inequality, and VC Dimension
Formal Learning Methods
Regularization & Recursive Least Squares
Revisiting the Iris Problem
Kernel Methods: Nonlinear Regression, Bayesian Learning, and Kernel Regression
Random Projection Machine Learning
Random Projection Recursive Least Squares (RP-RLS)
ML Ontology
Conditional Expectation and Big Data
Big Data Estimation
Conclusion
Adaptive Machine Learning
What is Dynamics?
References
Part 2 Systems Analytics
Chapter 5 Systems Theory Foundations of Machine Learning
Introduction-in-Stream Analytics
Basics for Adaptive ML
Exact Recursive Algorithms
Chapter 6 State Space Model and Bayes Filter
State-Space Model of Dynamical Systems
Kalman Filter for the State-Space Model
Special Combination of the Bayes Filter and Neural Networks
References
Chapter 7 The Kalman Filter for Adaptive Machine Learning
Kernel Projection Kalman Filter
Optimized Operation of the KP-Kalman Filter
Reference
Chapter 8 The Need for Dynamical Machine Learning: The Bayesian Exact Recursive Estimation
Need for Dynamical ML
States for Decision Making
Summary of Kalman Filtering and Dynamical Machine Learning
Chapter 9 Digital Twins
Causality
Inverse Digital Twin
Inverse Model Framework
Graph Causal Model
Causality Insights
Inverse Digital Twin Algorithm
Simulation
Conclusion
References
Epilogue A New Random Field Theory
References
Index
This book is the third iteration of the book I originally published in 2016 as “Systems Analytics.” The title reflected a new development effort in the field of machine learning, grounded firmly in systems theory. My intention in writing the first edition was to bring mathematically trained graduates in engineering, physics, mathematics, and allied fields into data science.
The objective of this edition, Data Science for IoT Engineers, remains the same. Part I, where I develop machine learning (ML) algorithms from the background of engineering courses such as control theory, signal processing, etc. is largely unchanged. However, dynamical systems-based Part II now takes a more detailed Multi-Input-Multi-Output (MIMO) systems approach, and develops a new and important form of digital twins called “causal” digital twin. This topic is significant, because on the one hand, digital twin is the seat of ML & AI in IoT solutions, but more important, causality is a critical factor in enabling “prescriptive analytics,” which is the real promise of Internet of Things (IoT). An epilogue has been added that introduces a new theory of random fields; it is shown that a new second-order property might have significant practical applications, and some of these are discussed.
In this part, we bring together machine learning, systems theory, linear algebra and digital signal processing. The intention is to make clear the similarity of basic theory and algorithms among these disparate fields. Hands-on exposure to machine learning is provided. This part concludes with a complete description of modern machine learning and a new ontology grounded in probability theory.
With the realization that business solutions are not “one and done” and they require ongoing measurement, tracking and fine-tuning, we embed machine learning in a closed-loop, real-time systems framework – adaptive machine learning This naturally leads to the formal development of state-space formulation, Bayesian estimation, and Kalman filter. We develop a “universal” nonlinear, time-varying, dynamical machine learning solution which can faithfully model all the essential complexities of real-life business problems, and show how to apply it.
Developing the systems theme into the framework of digital twins as the action center for IoT-related machine learning, we explore several types of digital twins: (1) display, (2) forward and (3) inverse. Inverse digital twin is an example of a powerful form of “causal” modeling that captures the “dynamics” or “kinetics” of machinery in industrial applications. In the epilogue, we introduce some future development possibilities for data science from a “complexity” point of view.
This book is neither a hard-core university text nor a popular science read. STEM enthusiasts of all ilk are introduced to data science in ways that leverage their engineering science background. Newly minted data scientists can see the larger framework beyond the bag-of-tricks that they learned in their coursework and glimpse the future of “adaptive” machine learning (what we call “systems analytics”). In addition, one can expect to clear the confusion of multiplying digital twin names and claims, while developing a full understanding of the “next-gen” causal digital twins.
P. G. Madhavan, Ph.D.October 2021
P. G. Madhavan, Ph.D. has an extensive background in IoT, machine learning, digital twin, and wireless technologies in roles such as Chief IoT Officer, Chief Acceleration Officer, IoT Startup Founder, IoT Product Manager at large corporations (Rockwell Automation, GE Aviation, NEC), and small firms and startups. After obtaining his Ph.D. in electrical and computer engineering from McMaster University, Canada, and a Master’s degree in biomedical engineering from IIT, Madras, Dr. Madhavan pursued original research in random field theory and computational neuroscience as a professor at the University of Michigan, Ann Arbor and Waterloo University, Canada, among others. His next career in corporate technology saw him assume product leadership roles at Microsoft, Bell Labs, Rockwell Automation, GE Aviation, and lastly at NEC. He has founded and was CEO at two startups (and CTO at two others) leading all aspects of startup life. Currently, he champions digital twins as the seat of AI/ML for IoT applications with an emphasis on causality.
This chapter includes insight into machine learning, data science, systems theory, digital twins, and artificial intelligence (AI), as well as their business relevance to the reader. In the following chapters, we will discuss the models and methods of these topics more formally. Machine learning (ML) using Big Data generates business value by predicting what profitable actions to take and when.
Successful businesses find a way to understand and manage complexity. Managing complex systems requires effectively using a significant amount of data at the right time. Big Data provides the data we need. To put the data to work, we have to anticipate what is about to happen and react when it happens in a closed loop manner. Predictive analytics allows us to push our system to the edge (without “falling over”) in a managed fashion. Now, an increase in prediction accuracy from 82% to 83% may not seem like much, but the business effects of that 1% improvement can be disproportionate (for example, a consumer conversion rate of 2% can jump to 20%). Businesses embrace predictive analytics to manage their business at a high level of performance and achieve excellent business results at the edge of complexity overload [CJ13].
Machine learning began in earnest in 1973 with the publication of Duda and Hart’s classic textbook Pattern Classification & Scene Analysis[DR73]. In the early days when computer processing power and memory were severely limited, the focus was to use as little data as possible to extract as much information as possible. Admittedly, the results were not very good and data specialists were not involved in the process. The advent of Big Data and the strong focus on developing algorithms that extract as much information as possible means that data analysts are an important part of business success.
One of the best methods to minimize the amount of data needed was to use each data point as it arrived, without storing all the past data and re-doing all the calculations. The past result was updated based on the new data point. This is what we referred to as “adaptive or recursive” methods. This process is also known as “learning.” Another early approach was to try to understand and replicate learning in living organisms via perceptrons and cybernetics (which encompasses the control and communication between an animal and a machine). These factors led to algorithmic learning, which is the technical basis of today’s popular technology, machine learning.
What do we need to experience success with data? To start with, we need both machine learning and analytics. We also require pattern recognition, statistical modeling, predictive analytics, data science, adaptive systems, and self-organizing systems. We won’t worry too much about nuanced meanings (there are differences, but not in our present context).
Note AI is not included in this list. Creating AI by mimicking the human brain seems to be a fool’s errand. If you think about the neuronal axons along which electrical spikes travel, they are like billions of conducting wires with the insulation scraped off every few millimeters. These billions of wires are also in a salt solution. Try sending your TCP/IP packets along such a network.
A better field to pursue is that of Intelligence Augmentation(IA). When Doug Englebart wrote to ARPA about IA in the 1960s, he could not have foreseen what Big Data could do. Englebart’s “Mother of All Demos” was really about communication augmentation using a computer mouse, video conferencing, and hypertext. With Big Data and analytics, we can truly perform intelligence augmentation.
Prediction is the foundational requirement in ML business use cases. Let us explore in some detail what prediction entails. Wanting to know the future has always been a human preoccupation. You cannot truly know the future, but in some cases, predictions are possible.
We should consider short-term and long-term predictions separately. Long-term prediction is nearly impossible. In the 1980s and 1990s, chaos and complexity theorists showed us that things can become uncontrollable even when we have perfect past and present information (for example, predicting the weather beyond three weeks is a major challenge, if not impossible). Stochastic process theory tells us that “non-stationarity,” where statistics evolve (slowly or fast), can render longer term predictions unreliable.
If the underlying systems do not evolve quickly or suddenly, there is some hope. Causal systems (in systems theory, no future information of any kind is available in the current state of the system) indicate that outcomes are predictable in the sense that, as long as certain conditions are met, we can be somewhat confident in predicting a few steps ahead. This may be quite useful in some data science applications (such as in fintech).
Another type of prediction involves not the actual path of future events (or the “state space trajectories”), but the occurrence of a “black swan” or an “X-event” (for an elegant in-depth discussion, see [CJ13]). Any unwanted event can be good to know about in advance. Consider unwanted destructive vibrations (called “chatter”) in machine tools, as an example; early warning may be possible and very useful in saving expensive work pieces [MP97]. We find that sometimes the underlying system does undergo some pre-event changes (such as approaching the complexity overload, state-space volume inflation, and increase in degrees of freedom) which may be detectable and trackable. However, there is no way to prevent false positives (and the associated waste of resources preparing for an event that does not come) or false negatives (and be blind-sided when we are told it is not going to happen).
We will use an explicit systems theory approach to analytics. In our system analytics formulation, the parameters of the system and its variation over time are tracked adaptivelyin real time, and so it can tell us how long into the future we can predict safely. If the parameters evolve slowly or cyclically, we can have a higher confidence in our predictive analytics solutions.
Machine learning has to do with learning, i.e., the ability to generalize from experience. A necessary feature of learning is feedback, either explicit, as in the case of supervised learning, or implicit, as in the cases of unsupervised or reinforcement learning.
There is a class of algorithms that is very useful but does not exhibit the learning and feedback behaviors. We will exclude them from our current discussion of analytics and ML. Algorithms such as decision trees are more associated with data mining than ML. Such approaches are useful but do not involve learning or feedback about the information itself.
We will select a business context to explore ML. There is rarely a business solution that requires a one-time answer that will completely solve the problem. Almost all business problems you’ll typically encounter require some type of regular attention. Analysts need to monitor the outcomes of the first solution, tweak the approach, and apply it again after a while.
Data scientists have the duty to educate business clients to subscribe to this view, what we call “goal-seeking” or the tracking solution concept. The first solution may just provide an 80% solution to the original problem, but with tracking, it can improve over time. With such realistic expectations, your customer will be delighted if the trajectory of improvement is good and fast.
With this preamble, let us define a canonical or prototypical business problem that will help crystallize our ML approach and the analytics roadmap ahead of us.
Let us consider a retail commerce business, a brick-and-mortar grocery store chain, and the CPG (consumer product goods) manufacturers who supply them with fast-moving consumer goods (FMCG).
Consider a grocery retail chain with 100 stores. Assume that each store has 5 departments, 20 product categories per department, 50 brands per category, and 70 SKUs per brand. The product category is the level in the product hierarchy to focus on, since shoppers’ choices are within a category. Each store then has 100 categories and 1,000 brands and 70,000 SKUs.
Consider a neighborhood and stores in the vicinity; the activities near each store can be dramatically different (for example, one may have a school nearby, and another may have offices near it). How does the store manager at one of these stores decide what products to carry? This is called the Optimal Product Assortment problem in retail merchandising. This is important not only to the retailer, but also the CPGs that supply products to the store.
The store manager has several possible options: they can make (1) educated guesses (not very reliable), (2) ask the shoppers what they want (does not scale), and (3) rely on market survey data (this is usually based on sampling and is not specific to SKU and store). Using Big Data and analytics, can we do better?
Recommendation engines are commonplace today – such as Amazon or Netflix. They provide recommendations to an individual shopper (for goods or movies) based on a particular shopper’s likes and dislikes as well as the sellers’ business priorities (push out movies in the “long tail” for Netflix and sell goods in the “fat front” from Amazon warehouses).
Figure 1.1
Product density in stores
Optimal product assortment is a slightly more difficult problem. We need a recommendation engine for a group of shoppers, one for each store. Note that the optimization problem arises because shelf space is limited; this constraint applies equally to large ecommerce warehouses where the shelves are numerous, but the number of products to stock is significant. The challenge is the product density, which varies, as shown in Figure 1.1.
The natural approach to such a problem is to group the hundreds or thousands of shoppers at a store via segmentation or clustering. Once grouped into a few segments, the recommendation engine can be optimized for each segment, and the optimum product assortment can be derived from the proportion of these few segments that shop at the store.
The state-of-the-art in commerce is behavioral segmentation, where the market is divided into segments based on pre-selected characteristics that apply to all product categories in a store.
Clearly, placing shoppers into convenient segments such as “Price sensitive” or “Families,” as shown in Figure 1.2, allows one level of meaningful abstraction. Instead of addressing millions of shoppers individually, one can tailor marketing, merchandising, and loyalty efforts to a handful of labelled groups.
Figure 1.2
Behavioral segmentation example
However, what is helpful at one level can be a flawed approach for some applications. Consider a case where a particular shopper, per behavioral segmentation, ended up in the “Price sensitive” group. While this may be true in general for them, they may have specific preferences in certain product categories; for example, while “Price sensitive” in general, their wine choice may be the expensive Châteauneuf-du-Pape brand. Such misallocations, when multiplied by millions of shoppers, lead to flawed product assortment decisions when behavioral segmentation is applied to merchandising. Let us consider a better approach using shopper data and machine learning to create and identify segments.
In ML segmentation, the shoppers (whatever their behavioral characteristics may be) fall into N Preference Groups (Figure 1.3) based on what they actually buy (actual purchase patterns are a great proxy for true product preference). In essence, each product category is its own unique market.
Figure 1.3
Preference groups
Traditional behavioral segmentation would have predicted that Brand X will sell more in Store 123 because “Price sensitive” shoppers prefer more of Brand X and Store 123 has more “Price sensitive” shoppers.
In the ML method, we realize that since Store 123 shoppers are well-represented by N preference groups for a particular product category, the proportion of the N groups that shop at Store 123 determines the assortment for that product category at Store 123. Making such fine distinctions with the aid of shopper data avoids the pitfall of employing the same behavioral groups across all product categories because shoppers’ purchase propensities can vary across categories.
Figure 1.4
Revenue opportunity gap
By comparing behavioral segmentation and the ML method to optimize product assortments, we obtained the results shown in Figure 1.4. Consider the Revenue Opportunity Gap (ROG) as an overall performance measure, which indicates a better product assortment optimization when the value is high. Assume that the overall revenue for a category (yogurt, in this example) was $100. While behavioral segmentation shows an average of 1% or $1 of ROG (improvement possibility), Projometry™, which is an ML segmentation method, shows a 5% or $5 improvement possibility. In other words, the improvement due to our data-driven method is five times higher than that due to behavioral segmentation.
Why is the ML-based approach better than behavioral segmentation? Table 1.1 shows the reasons. Looking at more dimensions than just the head-to-head outcome comparison above, it is clear that the ML method has several advantages when it comes to merchandising. Whenever the data determine groups rather than being externally imposed, the results will be superior to those of other methods. Another nice feature is that human labor and subjectivity in the segmentation process can be avoided, which makes analysis fast, inexpensive, and repeatable. The fact that separate preference groups are generated for every product category and that the ML method acts on what people purchase rather than why has led to breakout applications of this ML method in retail merchandising product assortment optimization.
Differences between segmentation approaches
In developing this group recommendation engine, you should have noticed that we have added an aspect of “modeling” to the solution. We modelled the shoppers into N preference groups. Alternatively, it can be said that any shopper can be approximated as a weighted linear combination of these N Groups. Let us keep this perspective in mind for systems analytics, which explicitly uses model-based thinking and algorithms.
Remember the concept of “goal-seeking and tracking solutions?” This is the context of systems analytics. The Product Assortment Optimization problem and solution can be reformulated as the following, keeping in mind that we have to track the solution over time. In the retail case, the shopper preferences change, product attributes change, and the local population around a store changes, some quickly and some slowly over time. We want to adapt to the changing circumstances and provide frequent updates of the product assortment solution to the store manager.
A complete characterization of retail dynamics is captured in the canonical diagram in Figure 1.5. In systems theory, this is called a MIMO system (Multi-Input Multi-Output system). This basic framework informs all of the development described in this book.
Figure 1.5
Systems analytics model
Retail business is about increasing customer acquisition and retention. Business owners have three areas to affect change: marketing, loyalty, and merchandising. Our prototypical business problem laid out earlier is in retail merchandising and so we will continue to work with that example.
For our basic solution, instead of behavioral segmentation, we used ML to discover N preference groups (or models) to create a group recommendation engine for each store.
Figure 1.6
Group recommendation engine
Here is a specific instance of such a solution from Syzen Analytics (Figure 1.6) where they use an ML algorithm called Projometry™ for the preference groups (using unsupervised learning) and create optimal product assortment for each store (SKU shares output). The overall group recommendation engine for each store generates feedback information (ROG) for supervised learning, which tracks the optimal product assortments over many quarters.
From this example, we can formalize ML within the systems theory context of systems analytics.