114,99 €
Discover data analytics methodologies for the diagnosis and prognosis of industrial systems under a unified random effects model In Industrial Data Analytics for Diagnosis and Prognosis - A Random Effects Modelling Approach, distinguished engineers Shiyu Zhou and Yong Chen deliver a rigorous and practical introduction to the random effects modeling approach for industrial system diagnosis and prognosis. In the book's two parts, general statistical concepts and useful theory are described and explained, as are industrial diagnosis and prognosis methods. The accomplished authors describe and model fixed effects, random effects, and variation in univariate and multivariate datasets and cover the application of the random effects approach to diagnosis of variation sources in industrial processes. They offer a detailed performance comparison of different diagnosis methods before moving on to the application of the random effects approach to failure prognosis in industrial processes and systems. In addition to presenting the joint prognosis model, which integrates the survival regression model with the mixed effects regression model, the book also offers readers: * A thorough introduction to describing variation of industrial data, including univariate and multivariate random variables and probability distributions * Rigorous treatments of the diagnosis of variation sources using PCA pattern matching and the random effects model * An exploration of extended mixed effects model, including mixture prior and Kalman filtering approach, for real time prognosis * A detailed presentation of Gaussian process model as a flexible approach for the prediction of temporal degradation signals Ideal for senior year undergraduate students and postgraduate students in industrial, manufacturing, mechanical, and electrical engineering, Industrial Data Analytics for Diagnosis and Prognosis is also an indispensable guide for researchers and engineers interested in data analytics methods for system diagnosis and prognosis.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 537
Veröffentlichungsjahr: 2021
Shiyu Zhou
University of Wisconsin – Madison
Yong Chen
University of Iowa
Copyright © 2021 by John Wiley & Sons, Inc. All rights reserved.
Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008.
Limit of Liability/Disclaimer of Warranty:
While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herin may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services please contact our Customer Care Department with the U.S. at 877-762-2974, outside the U.S. at 317-572-3993 or fax 317-572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print, however, may not be available in electronic format.
Library of Congress Cataloging-in-Publication Data:
Names: Zhou, Shiyu, 1970- author. | Chen, Yong (Professor of industrial and systems engineering), author.
Title: Industrial data analytics for diagnosis and prognosis : a random effects modelling approach / Shiyu Zhou, Yong Chen.
Description: Hoboken. NJ : John Wiley & Sons, Inc., 2021. | Includes bibliographical references and index.
Identifiers: LCCN 2021000379 (print) | LCCN 2021000380 (ebook) | ISBN 9781119666288 (hardback) | ISBN 9781119666295 (pdf) | ISBN 9781119666301 (epub) | ISBN 9781119666271 (ebook)
Subjects: LCSH: Industrial engineering--Statistical methods. | Industrial management--Mathematics. | Random data (Statistics) | Estimation theory.
Classification: LCC T57.35 .Z56 2021 (print) | LCC T57.35 (ebook) | DDC 658.0072/7--dc23
LC record available at https://lccn.loc.gov/2021000379
LC ebook record available at https://lccn.loc.gov/2021000380
Cover image: © monsitj/ iStock/Getty Images
Cover design by Wiley
Set in 9.5/12.5pt STIX Two Text by Integra Software Services, Pondicherry, India.
To our families:
Yifan and LauraJinghui, Jonathan, and Nathan
Cover
Title page
Copyright
Dedication
Preface
Acknowledgments
Acronyms
Table of Notation
Chapter 1: Introduction
1.1 Background and Motivation
1.2 Scope and Organization of the Book
1.3 How to Use This Book
Bibliographic Notes
Part 1 Statistical Methods and Foundation for Industrial Data Analytics
Chapter 2: Introduction to Data Visualization and Characterization
2.1 Data Visualization
2.1.1 Distribution Plots for a Single Variable
2.1.2 Plots for Relationship Between Two Variables
2.1.3 Plots for More than Two Variables
2.2 Summary Statistics
2.2.1 Sample Mean, Variance, and Covariance
2.2.2 Sample Mean Vector and Sample Covariance Matrix
2.2.3 Linear Combination of Variables
Bibliographic Notes
Exercises
Chapter 3: Random Vectors and the Multivariate Normal Distribution
3.1 Random Vectors
3.2 Density Function and Properties of Multivariate Normal Distribution
3.3 Maximum Likelihood Estimation for Multivariate Normal Distribution
3.4 Hypothesis Testing on Mean Vectors
3.5 Bayesian Inference for Normal Distribution
Bibliographic Notes
Exercises
Chapter 4: Explaining Covariance Structure: Principal Components
4.1 Introduction to Principal Component Analysis
4.1.1 Principal Components for More Than Two Variables
4.1.2 PCA with Data Normalization
4.1.3 Visualization of Principal Components
4.1.4 Number of Principal Components to Retain
4.2 Mathematical Formulation of Principal Components
4.2.1 Proportion of Variance Explained
4.2.2 Principal Components Obtained from the Correlation Matrix
4.3 Geometric Interpretation of Principal Components
4.3.1 Interpretation Based on Rotation
4.3.2 Interpretation Based on Low-Dimensional Approximation
Bibliographic Notes
Exercises
Chapter 5: Linear Model for Numerical and Categorical Response Variables
5.1 Numerical Response – Linear Regression Models
5.1.1 General Formulation of Linear Regression Model
5.1.2 Significance and Interpretation of Regression Coefficients
5.1.3 Other Types of Predictors in Linear Models
5.2 Estimation and Inferences of Model Parameters for Linear Regression
5.2.1 Least Squares Estimation
5.2.2 Maximum Likelihood Estimation
5.2.3 Variable Selection in Linear Regression
5.2.4 Hypothesis Testing
5.3 Categorical Response – Logistic Regression Model
5.3.1 General Formulation of Logistic Regression Model
5.3.2 Significance and Interpretation of Model Coefficients
5.3.3 Maximum Likelihood Estimation for Logistic Regression
Bibliographic Notes
Exercises
Chapter 6: Linear Mixed Effects Model
6.1 Model Structure
6.2 Parameter Estimation for LME Model
6.2.1 Maximum Likelihood Estimation Method
6.2.2 Distribution-Free Estimation Methods
6.3 Hypothesis Testing
6.3.1 Testing for Fixed Effects
6.3.2 Testing for Variance–Covariance Parameters
Bibliographic Notes
Exercises
Part 2 Random Effects Approaches for Diagnosis and Prognosis
Chapter 7: Diagnosis of Variation Source Using PCA
7.1 Linking Variation Sources to PCA
7.2 Diagnosis of Single Variation Source
7.3 Diagnosis of Multiple Variation Sources
7.4 Data Driven Method for Diagnosing Variation Sources
Bibliographic Notes
Exercises
Chapter 8: Diagnosis of Variation Sources Through Random Effects Estimation
8.1 Estimation of Variance Components
8.2 Properties of Variation Source Estimators
8.3 Performance Comparison of Variance Component Estimators
Bibliographic Notes
Exercises
Chapter 9: Analysis of System Diagnosability
9.1 Diagnosability of Linear Mixed Effects Model
9.2 Minimal Diagnosable Class
9.3 Measurement System Evaluation Based on System Diagnosability
Bibliographic Notes
Exercises
Appendix
Chapter 10: Prognosis Through Mixed Effects Models for Longitudinal Data
10.1 Mixed Effects Model for Longitudinal Data
10.2 Random Effects Estimation and Prediction for an Individual Unit
10.3 Estimation of Time-to-Failure Distribution
10.4 Mixed Effects Model with Mixture Prior Distribution
10.4.1 Mixture Distribution
10.4.2 Mixed Effects Model with Mixture Prior for Longitudinal Data
10.5 Recursive Estimation of Random Effects Using Kalman Filter
10.5.1 Introduction to the Kalman Filter
10.5.2 Random Effects Estimation Using the Kalman Filter
Biographical Notes
Exercises
Appendix
Chapter 11: Prognosis Using Gaussian Process Model
11.1 Introduction to Gaussian Process Model
11.2 GP Parameter Estimation and GP Based Prediction
11.3 Pairwise Gaussian Process Model
11.3.1 Introduction to Multi-output Gaussian Process
11.3.2 Pairwise GP Modeling Through Convolution Process
11.4 Multiple Output Gaussian Process for Multiple Signals
11.4.1 Model Structure
11.4.2 Model Parameter Estimation and Prediction
11.4.3 Time-to-Failure Distribution Based on GP Predictions
Bibliographical Notes
Exercises
Chapter 12: Prognosis Through Mixed Effects Models for Time-to-Event Data
12.1 Models for Time-to-Event Data Without Covariates
12.1.1 Parametric Models for Time-to-Event Data
12.1.2 Non-parametric Models for Time-to-Event Data
12.2 Survival Regression Models
12.2.1 Cox PH Model with Fixed Covariates
12.2.2 Cox PH Model with Time Varying Covariates
12.2.3 Assessing Goodness of Fit
12.3 Joint Modeling of Time-to-Event Data and Longitudinal Data
12.3.1 Structure of Joint Model and Parameter Estimation
12.3.2 Online Event Prediction for a New Unit
12.4 Cox PH Model with Frailty Term for Recurrent Events
Bibliographical Notes
Exercises
Appendix
Appendix: Basics of Vectors, Matrices, and Linear Vector Space
References
Index
Cover
Title page
Copyright
Dedication
Table of Contents
Preface
Acknowledgments
Acronyms
Table of Notation
Begin Reading
Appendix: Basics of Vectors, Matrices, and Linear Vector Space
References
Index
End User License Agreement
i
ii
iii
iv
v
vi
vii
viii
ix
x
xi
xii
xiii
xiv
xv
xvi
xvii
xviii
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
Today, we are facing a data rich world that is changing faster than ever before. The ubiquitous availability of data provides great opportunities for industrial enterprises to improve their process quality and productivity. Industrial data analytics is the process of collecting, exploring, and analyzing data generated from industrial operations and throughout the product life cycle in order to gain insights and improve decision-making. This book describes industrial data analytics approaches with an emphasis on diagnosis and prognosis of industrial processes and systems.
A large number of textbooks/research monographs exist on diagnosis and prognosis in the engineering field. Most of these engineering books focus on model-based diagnosis and prognosis problems in dynamic systems. The model-based approaches adopt a dynamic model for the system, often in the form of a state space model, as the basis for diagnosis and prognosis. Different from these existing books, this book focuses on the concept of random effects and its applications in system diagnosis and prognosis. The impetus for this book arose from the current digital revolution. In this digital age, the essential feature of a modern engineering system is that a large amount of data from multiple similar units/machines during their operations are collected in real time. This feature poses significant intellectual opportunities and challenges. As for opportunities, since we have observations from potentially a very large number of similar units, we can compare their operations, share the information, and extract common knowledge to enable accurate and tailored prediction and control at the individual level. As for challenges, because the data are collected in the field and not in a controlled environment, the data contain significant variation and heterogeneity due to the large variations in working/usage conditions for different units. This requires that the analytics approaches should be not only general (so that the common information can be learned and shared), but also flexible (so that the behavior of an individual unit can be captured and controlled). The random effects modeling approaches can exactly address these opportunities and challenges.
Random effects, as the name implies, refer to the underlying random factors in an industrial process or system that impact on the outcome of the process. In diagnosis and prognosis applications, random effects can be used to model the sources of variation in a process and the variation among individual characteristics of multiple heterogeneous units. Some excellent books are available in the industrial statistics area. However, these existing books mainly focus on population level behavior and fixed effects models. The goal of this book is to adapt and bring the theory and techniques of random effects to the application area of industrial system diagnosis and prognosis.
The book contains two main parts. The first part covers general statistical concepts and theory useful for describing and modeling variation, fixed effects, and random effects for both univariate and multivariate data, which provides the necessary background for the second part of the book. The second part covers advanced statistical methods for variation source diagnosis and system failure prognosis based on the random effects modeling approach. An appendix summarizing the basic results in linear spaces and matrix theory is also included at the end of the book for the sake of completeness.
This book is intended for students, engineers, and researchers who are interested in using modern statistical methods for variation modeling, analysis, and prediction in industrial systems. It can be used as a textbook for a graduate level or advanced undergraduate level course on industrial data analytics and/or quality and reliability engineering. We also include “Bibliographic Notes” at the end of each chapter that highlight relevant additional reading materials for interested readers. These bibliographic notes are not intended to provide a complete review of the topic. We apologize for missing literature that is relevant but not included in these notes.
Many of the materials of this book come from the authors’ recent research works in variation modeling and analysis, variation source diagnosis, and system condition and failure prognosis for manufacturing systems and beyond. We hope this book can stimulate some new research and serve as a reference book for researchers in this area.
Shiyu ZhouMadison, Wisconsin, USAYong ChenIowa City, Iowa, USA
We would like to thank the many people we collaborated with that have led up to the writing of this book. In particular, we would like to thank Jianjun Shi, our Ph.D. advisor at the University of Michigan (now at Georgia Tech.), for his continuous advice and encouragement. We are grateful for our colleagues Daniel Apley, Darek Ceglarek, Yu Ding, Jionghua Jin, Dharmaraj Veeramani, Yilu Zhang for their collaborations with us on the related research topics. Grateful thanks also go to Raed Kontar, Junbo Son, and Chao Wang who have helped with the book including computational code to create some of the illustrations and designing the exercise problems. Many students including Akash Deep, Salman Jahani, Jaesung Lee, and Congfang Huang read parts of the manuscript and helped with the exercise problems. We thank the National Science Foundation for the support of our research work related to the book.
Finally, a very special note of appreciation is extended to our families who have provided continuous support over the past years.
S.Z. and Y.C.
AIC
Akaike Information Criterion
BIC
Bayesian Information Criterion
CDF
Cumulative Distribution Function
EM
Expectation–Maximization
GP
Gaussian Process
i.i.d.
Independent and identically distributed
IoT
Internet of Things
IQR
Interquartile Range
KCC
Key Control Characteristic
KQC
Key Quality Characteristic
LME
Linear Mixed Effects
LRT
Likelihood Ratio Test
MLE
Maximum Likelihood Estimation
MINQUE
Minimum Norm Quadratic Unbiased Estimation
MOGP
Multiple Output Gaussian Process
PCA
Principal Component Analysis
probability density function
PH
Proportional Hazards
RREF
Reduced Row Echelon Form
REML
Restricted Maximum Likelihood
RUL
Remaining Useful Life
r.v.
random variable(s)
SNR
Signal-to-Noise Ratio
Today, we are facing a data rich world that is changing faster than ever before. The ubiquitous availability of data provides great opportunities for industrial enterprises to improve their process quality and productivity. Indeed, the fast development of sensing, communication, and information technology has turned modern industrial systems into a data rich environment. For example, in a modern manufacturing process, it is now common to conduct a 100% inspection of product quality through automatic inspection stations. In addition, many modern manufacturing machines are numerically-controlled and equipped with many sensors and can provide various sensing data of the working conditions to the outside world.
One particularly important enabling technology in this trend is the Internet of Things (IoT) technology. IoT represents a network of physical devices, which enables ubiquitous data collection, communication, and sharing. One typical application of the IoT technology is the remote condition monitoring, diagnosis, and failure prognosis system for after-sales services. Such a system typically consists of three major components as shown in Figure 1.1: (i) the in-field units (e.g., cars on the road), (ii) the communication network, and (iii) the back-office/cloud data processing center. The sensors embedded in the in-field unit continuously generate data, which are transmitted through the communication network to the back office. The aggregated data are then processed and analyzed at the back-office to assess system status and produce prognosis. The analytics results and the service alerts are passed individually to the in-field unit. Such a remote monitoring system can effectively improve the user experience, enhance the product safety, lower the ownership cost, and eventually gain competitive advantage for the manufacturer. Driven by the rapid development of information technology and the critical needs of providing fast and effective after-sales services to the products in a globalized market, the remote monitoring systems are becoming increasingly available.
Figure 1.1 A diagram of an IoT enabled remote condition monitoring system.
The unprecedented data availability provides great opportunities for more precise and contextualized system condition monitoring, diagnosis, and prognosis, which are very challenging to achieve if only scarce data are available. Industrial data analytics is the process of collecting, exploring, and analyzing data generated from industrial operations throughout the product life cycle in order to gain insights and improve decision-making. Industrial data analytics encompasses a vast set of applied statistics and machine learning tools and techniques, including data visualization, data-driven process modeling, statistical process monitoring, root cause identification and diagnosis, predictive analytics, system reliability and robustness, and design of experiments, to name just a few. The focus of this book is industrial data analytics approaches that can take advantage of the unprecedented data availability. Particularly, we focus on the concept of random effects and its applications in system diagnosis and prognosis.
The terms diagnosis and prognosis were originally used in the medical field. Diagnosis is the identification of the disease that is responsible to the symptoms of the patient’s illness, and prognosis is a forecast of the likely course of the disease. In the field of engineering, these terms have similar meanings: for an industrial system, diagnosis is the identification of the root cause of a system failure or abnormal working condition; and prognosis is the prediction of the system degradation status and the future failure or break down. Obviously, diagnosis and prognosis play a critical role in assuring smooth, efficient, and safe system operations. Indeed, diagnosis and prognosis have attracted ever-growing interest in recent years. This trend has been driven by the fact that capital goods manufacturers have been coming under increasing pressure to open up new sources of revenue and profit in recent years. Maintenance service costs constitute around 60–90% of the life-cycle costs of industrial machinery and installations. Systematic extension of the after-sales service business will be an increasingly important driver of profitable growth.
Due to the importance of diagnosis and prognosis in industrial system operations, a relatively large number of books/research monographs exist on this topic [Lewis et al., 2011, Niu, 2017, Wu et al., 2006, Talebi et al., 2009, Gertler, 1998, Chen and Patton, 2012, Witczak, 2007, Isermann, 2011, Ding, 2008, Si et al., 2017]. As implied by their titles, many of these books focus on model-based diagnosis and prognosis problems in dynamic systems. A model-based approach adopts a dynamic model, often in the form of a state space model, as the basis for diagnosis and prognosis. Then the difference between the observations and the model predictions, called residuals, are examined to achieve fault identification and diagnosis. For the prognosis, data-driven dynamic forecasting methods, such as time series modeling methods, are used to predict the future values of the interested system signals. The modeling and analysis of the system dynamics are the focus of the existing literature.
Different from the existing literature, this book focuses on the concept of random effects and its applications in system diagnosis and prognosis. Random effects, as the name implies, refer to the underlying random factors in an industrial process that impact on the outcome of the process. In diagnosis and prognosis applications, random effects can be used to model the sources of variation in a process and the variation among individual characteristics of multiple heterogeneous units. The following two examples illustrate the random effects in industrial processes.
Example 1.1 Random effects in automotive body sheet metal assembly processes
The concept of variation source is illustrated for an assembly operation in which two parts are welded together. In an automotive sheet metal assembly process, the sheet metals will be positioned and clamped on the fixture system through the matching of the locators (also called pins) on the fixture system and the holes on the sheet metals. Then the sheet metals will be welded together. Clearly, the accuracy of the positions of the locating pins and the tightness of the matching between the pins and the holes significantly influence the dimensional accuracy of the final assembly. Figure 1.2(a) shows the final product as designed. The assembly process is as follows: Part 1 is first located on the fixture and constrained by 4-way Pin L1 and 2-way Pin L2. A 4-way pin constrains the movement in two directions, while a 2-way pin only constrains the movement in one direction. Then, Part 2 is located by 4-way Pin L3 and 2-way Pin L4. The two parts are then welded together in a joining operation and released from the fixture.
Figure 1.2 Random effects in an assembly operation.
If the position or diameter of Pin L1 deviates from design nominal, then Part 1 will consequently not be in its designed nominal position, as shown in Figure 1.2(b). After joining Part 1 and Part 2, the dimensions of the final parts will deviate from the designed nominal values. One critical point that needs to be emphasized is that Figure 1.2(b) only shows one possible realization of produced assemblies. If we produce another assembly, the deviation of the position of Part 1 could be different. For instance, if the diameter of a pin is reduced due to pin wear, then the matching between the pin and the corresponding hole will be loose, which will lead to random wobble of the final position of part. This will in turn cause increased variation in the dimension of the produced final assemblies. As a result, mislocations of the pin can be manifested by either mean shift or variance change in the dimensional quality measurement such as M1 and M2 in the figure. In the case of mean shift error (for example due to a fixed position shift of the pin), the error can be compensated by process adjustment such as realignment of the locators. The variance change errors (for example due to a worn-out pin or the excessive looseness of a pin) cannot be easily compensated for in most cases. Also, note that each locator in the process is a potential source of the variance change errors, which is referred to as a variation source. The variation sources are random effects in the process that will impact on the final assembly quality. In most assembly processes, the pin wear is difficult to measure so the random effects are not directly observed. In a modern automotive body assembly process, hundreds of locators are used to position a large number of parts and sub-assemblies. An important and challenging diagnosis problem is to estimate and identify the variation sources in the process based on the observed quality measurements.
Example 1.2 Random effects in battery degradation processes
In industrial applications, the reliability of a critical unit is crucial to guarantee the overall functional capabilities of the entire system. Failure of such a unit can be catastrophic. Turbine engines of airplanes, power supplies of computers, and batteries of automobiles are typical examples where failure of the unit would lead to breakdown of the entire system. For these reasons, the working condition of such critical units must be monitored and the remaining useful life (RUL) of such units should be predicted so that we can take preventive actions before catastrophic failure occurs. Many system failure mechanisms can be traced back to some underlying degradation processes. An important prognosis problem is to predict RUL based on the degradation signals collected, which are often strongly associated with the failure of the unit. For example, Figure 1.3 shows the evolution of the internal resistance signals of multiple automotive lead-acid batteries. The internal resistance measurement is known to be one of the best condition monitoring signals for the battery life prognosis [Eddahech et al., 2012]. As we can see from Figure 1.3, the internal resistance measurement generally increases with the service time of the battery, which indicates that the health status of the battery is deteriorating.
Figure 1.3 Internal resistance measures from multiple batteries over time.
We can clearly see from Figure 1.3 that although similar, the progression paths of the internal resistance over time of different batteries are not identical. The difference is certainly expected due to many random factors in the material, manufacturing processes, and the working environment that vary from unit-to-unit. The random characteristics of degradation paths are random effects, which impact the observed degradation signals of multiple batteries.
The available data from multiple similar units/machines poses interesting intellectual opportunities and challenges for prognosis. As for opportunities, since we have observations from potentially a very large number of similar units, we can compare their operations/conditions, share the information, and extract common knowledge to enable accurate prediction and control at the individual level. As for challenges, because the data are collected in the field and not in a controlled environment, the data contain significant variation and heterogeneity due to the large variations in working conditions for different units. The data analytics approaches should not only be general (so that the common information can be learned and shared), but also flexible (so that the behavior of an individual subject can be captured and controlled).
Random effects always exist in industrial processes. The process variation caused by random effects is detrimental and thus random effects should be modeled, analyzed, and controlled, particularly in system diagnosis and prognosis. However, due to the limitation in the data availability, the data analytics approaches considering random effects have not been widely adopted in industrial practices. Indeed, before the significant advancement in communication and information technology, data collection in industries often occurs locally in very similar environments. With such limited data, the impact of random effects cannot be exposed and modeled easily. This situation has changed significantly in recent years due to the digital revolution as mentioned at the beginning of the section.
The statistical methods for random effects provide a powerful set of tools for us to model and analyze the random variation in an industrial process. The goal of this book is to provide a textbook for engineering students and a reference book for researchers and industrial practitioners to adapt and bring the theory and techniques of random effects to the application area of industrial system diagnosis and prognosis. The detailed scope of the book is summarized in the next section.
This book focuses on industrial data analytics methods for system diagnosis and prognosis with an emphasis on random effects in the system. Diagnosis concerns identification of the root cause of a failure or an abnormal working condition. In the context of random effects, the goal of diagnosis is to identify the variation sources in the system. Prognosis concerns using data to predict what will happen in the future. Regarding random effects, prognosis focuses on addressing unit-to-unit variation and making degradation/failure predictions for each individual unit considering the unique characteristic of the unit.
The book contains two main parts:
Statistical Methods and Foundation for Industrial Data Analytics
This part covers general statistical concepts, methods, and theory useful for describing and modelling the variation, the fixed effects, and the random effects for both univariate and multivariate data. This part provides necessary background for later chapters in part II. In part I, Chapter 2 introduces the basic statistical methods for visualizing and describing data variation. Chapter 3 introduces the concept of random vectors and multivariate normal distribution. Basic concepts in statistical modeling and inference will also be introduced. Chapter 4 focuses on the principal component analysis (PCA) method. PCA is a powerful method to expose and describe the variations in multivariate data. PCA has broad applications in variation source identification. Chapter 5 focuses on linear regression models, which are useful in modeling the fixed effects in a dataset. Statistical inference in linear regression including parameter estimation and hypothesis testing approaches will be discussed. Chapter 6 focuses on the basic theory of the linear mixed effects model, which captures both the fixed effects and the random effects in the data.
Random Effects Approaches for Diagnosis and Prognosis
This part covers the applications of the random effects modeling approach to diagnosis of variation sources and to failure prognosis in industrial processes/systems. Through industrial application examples, we will present variation pattern based variation source identification in Chapter 7. Variation source estimation methods based on the linear mixed effects model will be introduced in Chapter 8. A detailed performance comparison of different methods for practical applications is presented as well. In Chapter 9, the diagnosability issue for the variation source diagnosis problem will be studied. Chapter 10 introduces the mixed effects longitudinal modeling approach for forecasting system degradation and predicting remaining useful life based on the first time hitting probability. Some variations of the basic method such as the method considering mixture prior for unbalanced data in remaining useful life prediction are also presented. Chapter 11 introduces the concept of Gaussian processes as a nonparametric way for the modeling and analysis of multiple longitudinal signals. The application of the multi-output Gaussian process for failure prognosis will be presented as well. Chapter 12 introduces the method for failure prognosis combining the degradation signals and time-to-event data. The advanced joint prognosis model which integrates the survival regression model and the mixed effects regression model is presented.
This book is intended for students, engineers, and researchers who are interested in using modern statistical methods for variation modeling, diagnosis, and prediction in industrial systems.
This book can be used as a textbook for a graduate level or advanced undergraduate level courses on industrial data analytics. The book is fairly self-contained, although background in basic probability and statistics such as the concept of random variable, probability distribution, moments, and basic knowledge in linear algebra such as matrix operations and matrix decomposition would be useful. The appendix at the end of the book provides a summary of the necessary concepts and results in linear space and matrix theory. The materials in Part II of the book are relatively independent. So the instructor could combine selected chapters in Part II with Part I as the basic materials for different courses. For example, topics in Part I can be used for an advanced undergraduate level course on introduction to industrial data analytics. The materials in Part I and some selected chapters in Part II (e.g., Chapters 7, 8, and 9) can be used in a master’s level statistical quality control course. Similarly, materials in Part I and selected later chapters in Part II (e.g., Chapters 10, 11, 12) can be used in a master’s level course with emphasis on prognosis and reliability applications. Finally, Part II alone can be used as the textbook for an advanced graduate level course on diagnosis and prognosis.
One important feature of this book is that we provide detailed descriptions of software implementation for most of the methods and algorithms. We adopt the statistical programming language R in this book. R language is versatile and has a very large number of up-to-date packages implementing various statistical methods [R Core Team, 2020]. This feature makes this book fit well with the needs of practitioners in engineering fields to self study and implement the statistical modeling and analysis methods. All the R codes and data sets used in this book can be found at the book companion website.
Some examples of good books on system diagnosis and prognosis in engineering area are Lewis et al. [2011], Niu [2017], Wu et al. [2006], Talebi et al. [2009], Gertler [1998], Chen and Patton [2012], Witczak [2007], Isermann [2011], Ding [2008], Si et al. [2017]. Many good textbooks are available on industrial statistics. For example, Montgomery [2009], DeVor et al. [2007], Colosimo and Del Castillo [2006], Wu and Hamada [2011] are on statistical monitoring and design. On the failure event analysis and prognosis, Meeker and Escobar [2014], Rausand et al. [2004], Elsayed [2012] are commonly cited references.
