132,99 €
Introduces Machine Learning Techniques and Tools and Provides Guidance on How to Implement Machine Learning Into Chemical Safety and Health-related Model Development There is a growing interest in the application of machine learning algorithms in chemical safety and health-related model development, with applications in areas including property and toxicity prediction, consequence prediction, and fault detection. This book is the first to review the current status of machine learning implementation in chemical safety and health research and to provide guidance for implementing machine learning techniques and algorithms into chemical safety and health research. Written by an international team of authors and edited by renowned experts in the areas of process safety and occupational and environmental health, sample topics covered within the work include: * An introduction to the fundamentals of machine learning, including regression, classification and cross-validation, and an overview of software and tools * Detailed reviews of various applications in the areas of chemical safety and health, including flammability prediction, consequence prediction, asset integrity management, predictive nanotoxicity and environmental exposure assessment, and more * Perspective on the possible future development of this field Machine Learning in Chemical Safety and Health serves as an essential guide on both the fundamentals and applications of machine learning for industry professionals and researchers in the fields of process safety, chemical safety, occupational and environmental health, and industrial hygiene.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 613
Veröffentlichungsjahr: 2022
Cover
Title Page
Copyright Page
List of Contributors
Preface
1 Introduction
1.1 Background
1.2 Current State
1.3 Software and Tools
References
2 Machine Learning Fundamentals
2.1 What Is Learning?
2.2 Concepts of Machine Learning
2.3 Machine Learning Paradigms
2.4 Probably Approximately Correct Learning
2.5 Estimation and Approximation
2.6 Empirical Risk Minimization
2.7 Regularization
2.8 Maximum Likelihood Principle
2.9 Optimization
References
3 Flammability Characteristics Prediction Using QSPR Modeling
3.1 Introduction
3.2 Flowchart for Flammability Characteristics Prediction
3.3 QSPR Review for Flammability Characteristics
3.4 Limitations
3.5 Conclusions and Future Prospects
References
4 Consequence Prediction Using Quantitative Property–Consequence Relationship Models
4.1 Introduction
4.2 Conventional Consequence Prediction Methods
4.3 Machine Learning and Deep Learning‐Based Consequence Prediction Models
4.4 Quantitative Property–Consequence Relationship Models
4.5 Challenges and Future Directions
References
5 Machine Learning in Process Safety and Asset Integrity Management
5.1 Opportunities and Threats
5.2 State‐of‐the‐Art Reviews
5.3 Case Study of Asset Integrity Assessment
5.4 Data‐Driven Model of Asset Integrity Assessment
5.5 Conclusion
References
6 Machine Learning for Process Fault Detection and Diagnosis
6.1 Background
6.2 Machine Learning Approaches in Fault Detection and Diagnosis
6.3 Supervised Methods for Fault Detection and Diagnosis
6.4 Unsupervised Learning Models for Fault Detection and Diagnosis
6.5 Intelligent FDD Using Machine Learning
6.6 Concluding Remarks
References
7 Intelligent Method for Chemical Emission Source Identification
7.1 Introduction
7.2 Intelligent Methods for Recognizing Gas Emission
7.3 Intelligent Methods for Identifying Emission Sources
7.4 Conclusions and Future Work
References
8 Machine Learning and Deep Learning Applications in Medical Image Analysis
8.1 Introduction
8.2 CNN‐Based Models for Classification
8.3 Case Study
8.4 Limitations and Future Work
References
9 Predictive Nanotoxicology: Nanoinformatics Approach to Toxicity Analysis of Nanomaterials
9.1 Predictive Nanotoxicology
9.2 Machine Learning Modeling for Predictive Nanotoxicology
9.3 Development of Machine Learning Based Models for Nano‐(Q)SARs
9.4 Nanoinformatics Approaches to Predictive Nanotoxicology
9.5 Summary
Acronyms
References
10 Machine Learning in Environmental Exposure Assessment
10.1 Introduction
10.2 Environmental Exposure Modeling
10.3 Machine Learning Exposure Models
10.4 Model Evaluation
10.5 Case Study
10.6 Other Topics
10.7 Conclusion
References
11 Air Quality Prediction Using Machine Learning
11.1 Introduction
11.2 Air Quality and Climate Data Acquisition
11.3 Applications of Machine Learning in Air Quality Study
11.4 An Application Practice Example
References
12 Current Challenges and Perspectives
12.1 Current Challenges
12.2 Perspectives
References
Index
End User License Agreement
Chapter 3
Table 3.1 The categories and numbers of descriptors calculated by DRAGON 7....
Table 3.1 Features of the three model types.
Table 3.3 Data available from DIPPR‐801 dataset.
Chapter 4
Table 4.1 Summary of machine learning‐based consequence analysis.
Table 4.2 Summary of consequence database for QPCR studies.
Table 4.3 Summary of property descriptors for QPCR studies.
Table 4.4 Summary of algorithms used for QPCR studies.
Chapter 5
Table 5.1 The summary of the ML‐related literatures in the process system.
Table 5.2 Examples of damage mechanisms.
Chapter 6
Table 6.1 Neural network activation functions.
Table 6.2 Different Kernel functions for SVM.
Table 6.3 TE process continuous process variables.
Table 6.4 Selected TE process fault condition for testing.
Table 6.5 TE fault condition for automated test.
Table 6.6 Data samples and obtained results using OCSVM and NN.
Table 6.7 Fault detection result comparison.
Table 6.8 Comparison of fault classification results.
Chapter 7
Table 7.1 Results of source term estimation by MRE‐PSO method.
Table 7.2 Results of source term parameters by two‐step nonlinear and linear...
Table 7.3 Comparison results of different common dispersion models.
Table 7.4 Skill scores of the MCMC method combined with different forward mo...
Chapter 8
Table 8.1 Evaluation metrics of modified Resnet50 and YOLOv4 models.
Chapter 9
Table 9.1 Examples of biomedical areas focusing on computational models/ana...
Table 9.2 Summary of model types and their descriptions.
Table 9.3 Summary of recently published nano‐(Q)SARs.
Table 9.4 Feature selection approaches utilized for nano‐(Q)SAR model devel...
Table 9.5 Symptoms, potential causes, and possible solutions for common iss...
Chapter 10
Table 10.1 Features used to predict ozone and their data sources.
Table 10.2 Estimates of exposure model predictive accuracy on the Californi...
Chapter 11
Table 11.1 Satellite datasets for air quality and climate variables.
Table 11.2 Ground‐based
in situ
air pollutants and meteorological observati...
Chapter 1
Figure 1.1 Structure of a single hidden layer neural network.
Figure 1.2 Calculation of a deep neural network with four hidden layers.
Figure 1.3 Structures of some typical neural networks.
Chapter 2
Figure 2.1 Trade‐off between estimation and approximation error. Left: we co...
Chapter 3
Figure 3.1 Flowchart of solution for QSPR study.
Figure 3.2 Flowchart of QSPR study process based on combinatorial algorithms...
Chapter 4
Figure 4.1 Three different types of consequence prediction methods.
Figure 4.2 QPCR development procedure diagram.
Figure 4.3 Linkage between QSPR and QPCR.
Chapter 5
Figure 5.1 PoFs by the major damage type groups at different data availabili...
Figure 5.2 Estimated risk values ($/year) for each individual failure mode i...
Figure 5.3 Stochastic regression‐based versus integrated model performance o...
Figure 5.4 The conceptual framework.
Figure 5.5 An example of steps for joint probability of failure estimation....
Figure 5.6 Technical layers of CM‐based AIM.
Chapter 6
Figure 6.1 Neural network model.
Figure 6.2 One‐class neural network model.
Figure 6.3 Intelligent fault detection and diagnosis.
Figure 6.4 Tennessee Eastman process flow diagram.
Figure 6.5 Proposed NN model accuracy with a number of iterations.
Figure 6.6 Autonomous and model self‐update test (a) model trained with non‐...
Chapter 7
Figure 7.1 Illustration of inverse gas emission process.
Figure 7.2 A recurrent NARX network model.
Figure 7.3 A fitting network model.
Figure 7.4 Monitoring results of CO
2
variation in the atmosphere without CO
2
Figure 7.5 Monitoring results of CO
2
variation in atmosphere with CO
2
releas...
Figure 7.6 The prediction and residual results of recurrent NARX network for...
Figure 7.7 The prediction and residual results of recurrent NARX network for...
Figure 7.8 The prediction and residual results of recurrent NARX network for...
Figure 7.9 Peak fit results of approximation errors with recurrent NARX net ...
Figure 7.10 The responses of sensor array of PEN 3 for different substances ...
Figure 7.11 The response map of different substances with the concentration ...
Figure 7.12 The response map of sensor array for furan (left) and ethyl acet...
Figure 7.13 The confusion matrix of the prediction by SVM with HOG features....
Figure 7.14 The confusion matrix of the prediction by simple deep learning n...
Figure 7.15 The training process curves of gas classification with simple DL...
Figure 7.16 The training process curves of gas classification with transferr...
Figure 7.17 The confusion matrix of the prediction by transferred VGG‐19 mod...
Figure 7.18 Comparison of different optimization methods for source paramete...
Figure 7.19 The location estimation with MRE‐PSO method for run 33 case.
Figure 7.20 Computation flow of one‐step nonlinear PSO‐Tikhonov nonlinear me...
Figure 7.21 L‐curve for Release 17 experiment with one‐step nonlinear PSO‐Ti...
Figure 7.22 Structure of Gaussian–SVM model.
Figure 7.23 The results of predicted concentrations with SVM‐MLA dispersion ...
Figure 7.24 Markov chains using the proposed MCMC–MLA model for different pa...
Figure 7.25 Histograms of the Markov chains for different source parameters ...
Figure 7.26 Distribution of real and estimated positions in the Markov chain...
Figure 7.27 95%CI of different parameters estimation for Run 33 (a and b) an...
Chapter 8
Figure 8.1 Modified ResNet50 architecture.
Figure 8.2 The residual learning building blocks: (a) regular block and (b) ...
Figure 8.3 Workflows of CNN development to diagnose obstructed locations in ...
Figure 8.4 Data structure of the training and test images prepared using CFD...
Figure 8.5 Prediction scores for (a) left lung obstructions, (b) right lung ...
Figure 8.6 Confusion matrix heat map for prediction.
Figure 8.7 HSV thresholding technique procedure: (a) CFD model output, (b) G...
Figure 8.8 Blended highlight regions of construction for (a) left lung, (b) ...
Chapter 9
Figure 9.1 Adverse outcome pathway (AOP) modeling for multiwall and single‐w...
Figure 9.2 ML models categorized into classification, regression models, and...
Figure 9.3 SOM‐based consensus cluster analysis of the NP HTS bioactivity pr...
Figure 9.4 (a) The left figure portrays SOM clusters obtained from HTS data ...
Figure 9.5 Nonredundant association rules derived from data on cell signalin...
Figure 9.6 The conditional dependence, quantified via the distribution of th...
Figure 9.7 SVM classification of two‐class problems (data points on the left...
Figure 9.8 (a) BN representing non‐descendants’ property where a variable
X
...
Figure 9.9 QD similarity network based on the RF model for cell viability us...
Figure 9.10 Cell viability (%) as a function of exposure time (hr), concentr...
Figure 9.11 The nano‐(Q)SAR model development workflow for: (a) data‐driven,...
Figure 9.12 Wrapper approach with exhaustive (or partially exhaustive) searc...
Figure 9.13 Bias‐variance‐decomposition. Diagram of the relationship of mode...
Figure 9.14 Nanoinformatics elements for the environmental and health impact...
Chapter 10
Figure 10.1 Ground‐level ozone monitors in California during June 2008 wildf...
Chapter 11
Figure 11.1 Flowchart of supervised machine learning applications in air qua...
Figure 11.2 EPA Air Quality System (AQS) O
3
ground‐based monitoring sites in...
Cover Page
Title Page
Copyright Page
List of Contributors
Preface
Table of Contents
Begin Reading
Index
Wiley End User License Agreement
iii
iv
xii
xiii
xiv
xv
xvi
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
297
298
299
300
301
302
303
304
Edited by
Qingsheng Wang
Department of Chemical Engineering
Texas A&M University
College Station, TX, USA
Changjie Cai
Department of Occupational and Environmental Health
Hudson College of Public Health, The University of Oklahoma
Oklahoma City, OK, USA
This edition first published 2023© 2023 John Wiley & Sons Ltd
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.
The right of Qingsheng Wang and Changjie Cai to be identified as the authors of the editorial material in this work has been asserted in accordance with law.
Registered OfficesJohn Wiley & Sons, Ltd., The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK
Editorial OfficeThe Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK
For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.
Wiley also publishes its books in a variety of electronic formats and by print‐on‐demand. Some content that appears in standard print versions of this book may not be available in other formats.
Limit of Liability/Disclaimer of WarrantyIn view of ongoing research, equipment modifications, changes in governmental regulations, and the constant flow of information relating to the use of experimental reagents, equipment, and devices, the reader is urged to review and evaluate the information provided in the package insert or instructions for each chemical, piece of equipment, reagent, or device for, among other things, any changes in the instructions or indication of usage and for added warnings and precautions. While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials, or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
Library of Congress Cataloging‐in‐Publication Data Applied for:HB ISBN: 9781119817482
Cover Image: © kessudap/ShutterstockCover Design by Wiley
Rustam AbubarkirovDepartment of Civil, Chemical, Environmental, and Materials EngineeringUniversity of BolognaBolognaItaly
Salim AhmedCentre for Risk, Integrity, and Safety Engineering (C-RISE)Faculty of Engineering and Applied ScienceMemorial University of NewfoundlandSt. John's, NagalandCanada
Rajeevan ArunthavanathanCentre for Risk, Integrity, and Safety Engineering (C-RISE)Faculty of Engineering and Applied ScienceMemorial University of NewfoundlandSt. John's, NagalandCanada
Changjie CaiDepartment of Occupational and Environmental HealthHudson College of Public HealthThe University of OklahomaOklahoma City, OKUSA
Yoram CohenDepartment of Chemical and Biomolecular EngineeringUniversity of CaliforniaLos Angeles, CAUSAandInstitute of the Environment and SustainabilityUniversity of CaliforniaLos Angeles, CAUSA
Yu FengSchool of Chemical EngineeringOklahoma State UniversityStillwater, OK USA
Lan GaoSchool of MeteorologyThe University of OklahomaNorman, OKUSA
Pingfan HuArtie McFerrin Department of Chemical EngineeringTexas A&M UniversityCollege Station, TXUSA
Xiao‐Ming HuSchool of MeteorologyThe University of OklahomaNorman, OKUSAandCenter for Analysis and Prediction of StormsThe University of OklahomaNorman, OKUSA
Syed ImtiazCentre for Risk, Integrity, and Safety Engineering (C-RISE)Faculty of Engineering and Applied ScienceMemorial University of NewfoundlandSt. John's, NagalandCanada
Juncheng JiangCollege of Safety Science and EngineeringNanjing Tech UniversityNanjingChina
Zeren JiaoArtie McFerrin Department of Chemical EngineeringTexas A&M UniversityCollege Station, TXUSA
Faisal KhanMary Kay O'Connor Process Safety CenterArtie McFerrin Department of Chemical EngineeringTexas A&M UniversityCollege Station, TXUSA
Bilal M. KhanDepartment of Computer Science and EngineeringCalifornia State University San BernardinoSan Bernardino, CAUSAandInstitute of the Environment and SustainabilityUniversity of CaliforniaLos Angeles, CAUSA
Denglong MaSchool of Mechanical EngineeringXi'an Jiaotong UniversityXi'anShannxi, China
Yong PanCollege of Safety Science and EngineeringNanjing Tech UniversityNanjingChina
Hao SunSafety and Security Science SectionDepartment of Values, Technology, and InnovationFaculty of Technology, Policy, and ManagementDelft University of TechnologyThe NetherlandsandCollege of Mechanical and Electronic EngineeringChina University of Petroleum (East China)QingdaoChina
Qingsheng WangArtie McFerrin Department of Chemical EngineeringTexas A&M UniversityCollege Station, TXUSA
Gregory L. WatsonDepartment of BiostatisticsUniversity of California at Los AngelesLos Angeles, CAUSA
Yan YanSchool of Electrical Engineering and Computer ScienceWashington State UniversityPullman, WAUSA
Ming YangSafety and Security Science SectionDepartment of Values, Technology, and InnovationFaculty of Technology, Policy, and ManagementDelft University of TechnologyThe NetherlandsandAustralia Maritime CollegeUniversity of TasmaniaLaunceston, TasmaniaAustralia
We are very pleased to offer the first edition of our book on machine learning related to chemical safety and health. The importance of chemical safety and health in professional education and industry remains critical, since professionals have a legal and ethical responsibility to prevent incidents and protect the public. Machine learning, a core subset of artificial intelligence, has developed tremendously and has been implemented in various fields of scientific research and professional practices. The need for professionals to understand the fundamentals and implementations of machine learning in chemical safety and health is essential if they are to apply them in their own work. The present book is rooted in an invited review paper published in ACS Chemical Health and Safety from our research group at Texas A&M University. It is also based on experience gained from numerous courses, references, and research publications. Thus, this book can provide guidance for professionals, including students, engineers, and scientists, who are interested in studying and applying machine learning methods in their studies, work, and research.
The objective of the book is to enable professionals to gain a broad overview of machine learning and its applications in chemical safety and health. Moreover, it guides readers to understand machine learning and identify the associated useful resources, such as the commonly used machine learning toolkits and public databases. One significant trend in the field of chemical safety and health is the continuously increased implementation of both shallow and deep learning. Many researchers have achieved significant improvements in chemical safety and health by adopting machine learning methods in their professional activities. The ever‐changing nature of this field resulted in extensive efforts being taken in the development of this book so as to match current technology and industrial practices. It is also intended to develop a common and understandable language between specialists and non‐specialists.
In addition, interdisciplinary study has gained increased interest among professionals from different fields. For example, several decades ago, chemical process safety, fire safety, industrial hygiene, occupational safety and health, environmental science, chemical emission, and other similar areas of practice were often isolated from each other. Now, many of these topics have been combined to form a more comprehensive field in both scientific research and professional practice. Therefore, this book serves to cover the machine learning applications involved with a wide variety of chemical safety and health issues while combining related topics into a broader one. This approach may serve as a basis for the further application of machine learning in various industries.
However, technologies, standards, regulations, and laws are continuously changed and updated. Machine learning and chemical safety differ from traditional engineering in that they remain dynamically evolving fields. Thus, information and guidance involving these topics is likely to become out of date relatively quickly, particularly in certain areas. The readers should recognize these types of changes and consult relevant professionals and organizations to ensure compliance with current requirements. It is our intention to update this book periodically.
We hope that this book will act as a catalyst for the development of deeper synergies among research areas, which include machine learning and chemical safety and health. We also hope that this book will serve as an important reference for solving current challenges involving chemical safety and health and contribute to a much safer and healthier future.
‐ Qingsheng Wang
College Station, TX, USA
‐ Changjie Cai
Oklahoma City, OK, USA
Pingfan Hu and Qingsheng Wang
Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX, USA
Machine learning (ML) is a method spanning a broad array of disciplines, involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, and others. Furthermore, it is the core subset of artificial intelligence (AI). The term “machine learning” was first proposed in 1959 by Arthur Samuel (Samuel 1959). Machine learning algorithms can build mathematical models based on training data to make predictions or decisions without being explicitly programmed to do so. Bayesian and Laplace's derivations of least squares and Markov chains, which date back to the seventeenth century, have previously constituted the tools and foundations widely used in ML (Andrieu et al. 2003). Since then, the ML algorithms have developed tremendously and have been widely applied in various aspects of scientific research and everyday life. These include data mining (Mitchell 1999), computer vision (Voulodimos et al. 2018), natural language processing (Cambria and White 2014), biometric recognition (Chaki et al. 2019), medical diagnosis (Bakator and Radosav 2008), detection of credit card fraud (Modi and Dayma 2017), stock market analysis (Chong et al. 2017), speech and handwriting recognition (Nassif et al. 2019), strategy games (Robertson and Watson 2015), and robotics (Pierson and Gashler 2017).
Deep learning (DL) is a relatively new branch within the field of ML. It is an algorithm that uses artificial neural networks (ANNs) as the architecture to characterize and learn data. The concept of DL originates from the research of ANNs, and a multilayer perceptron with multiple hidden layers is a DL structure (Lecun et al. 2015). DL forms a more abstract high‐level representation attribute category or feature by combining low‐level features to discover distributed feature representations of data. Several DL frameworks have been utilized, including deep neural network (DNN), convolutional neural network (CNN), and recurrent neural network (RNN).
The applications of ML algorithms in chemical safety and health studies date back to the mid‐1990s (Lee et al. 1995). Some research used basic ML algorithms in toxicity classification and prediction studies. For other fields such as hazardous property prediction and consequence analysis, the implementation of ML/DL algorithms did not emerge until the late 2000s (Pan et al. 2008; Pan et al. 2009). Chemical safety and health, although an important field, has rarely been investigated using interdisciplinary research with applied ML. This is because at the early development stage of ML/DL, the algorithm was relatively primitive, and its excellent predictive capabilities and accuracy were not widely verified and proven. Second, due to the lack of relatively simple and easy‐to‐use toolkits and the high skill requirements for algorithms and programming, the applications of ML/DL algorithms in chemical safety and health research have been limited. As a result, studies implementing ML have been relatively rare in the field of chemical safety and health in the late twentieth century and first decade of the twenty‐first century.
However, with the rapid advancement of AI and computer science in the past 10 years, the importance of ML/DL and their unparalleled advantages over traditional statistical methods and labor‐intensive work have drawn increasing attention and hence have developed significantly. There is also growing interest in expanding the application of ML/DL in the research field of chemical safety and health in academia.
In this book, ML fundamentals as well as popular ML/DL tools for the implementation of ML/DL in chemical safety and health research are introduced (Jiao et al. 2020a). For the applications of ML/DL, the book describes flammability characteristics predictions using quantitative structure–property relationship modeling (Chapter 3), consequence prediction using quantitative property–consequence relationship modeling (Chapter 4), ML involving process safety and asset integrity management (Chapter 5), and ML for process fault detection and diagnosis (Chapter 6). Furthermore, the book describes intelligent methods for chemical emission source identification (Chapter 7), ML and DL applications in medical image analysis (Chapter 8), predictive nanotoxicology: nanoinformatics approach for toxicity analysis of nanomaterials (Chapter 9), ML in environmental exposure assessment (Chapter 10), and air quality prediction using ML (Chapter 11). This book provides useful guidance for researchers and practitioners who are interested in implementing ML/DL related to chemical safety and health. This book is an excellent reference for readers to find more information about novel ML/DL tools and algorithms.
Author Tom Mitchell provides a modern definition of ML as follows: “A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P improves with experience E” (Jordan and Mitchell 2015). In general, there are three types of ML: supervised learning, unsupervised learning, and reinforcement learning.
Supervised learning learns a function from a given training data set. When new data (validation/test data) comes, it can predict the results based on the function. The training set requirements for supervised learning include inputs (features) and outputs (targets). The targets in the training set are already labeled (with specific experimental/simulation values). Common supervised learning algorithms include regression and classification algorithms. While some algorithms are only capable of classification analysis (e.g. linear discrimination analysis, naive Bayes classification), most of them (e.g. k‐nearest neighbor, random forest) are able to conduct both classification analysis and regression analysis (James et al. 2017; Witten et al. 2017).
The difference between supervised learning and unsupervised learning is whether or not the target of the training set is labeled. Compared with supervised learning, the training set of unsupervised learning has no artificially labeled results. Common unsupervised learning algorithms can be used for clustering (James et al. 2017; Witten et al. 2017). There is also semi‐supervised learning, which combines elements of supervised learning and unsupervised learning. The algorithm for semi‐supervised learning gradually adjusts its behavior as the environment changes.
For DL, the original work on neural networks was published by Warren McCulloch and Walter Pitts in 1943 (McCulloch and Pitts 1943). They introduced the McCulloch–Pitts neural model, also known as the “linear threshold gate.” As the first computational model of a neuron, the McCulloch–Pitts neural model is very simplistic, generating only a binary output. The weights and threshold require hand‐tuning. In the 1950s, the perceptron became the first model with the capability to autonomously learn the optimal weight coefficients, allowing the training of a single neuron (Rosenblatt 1958). With the help of the backpropagation algorithm, neural networks began to be trained with one or two hidden layers (Rumelhart et al. 1986).
A single hidden layer neural network consists of three layers: input layer, hidden layer, and output layer. In the neural network that is trained with supervised learning, the training set contains values for the inputs x and target outputs y. The hidden layer refers to the fact that in a training set, the true values for these nodes are not observed. As shown in Figure 1.1, a notation for the values of the input features is a[0], where the term “a” stands for activation. It refers to the values that different layers of the neural network pass on to the subsequent layers. After the input layer passes on the values x to the hidden layer, the hidden layer in turn generates some sets of activations, a[1]. Finally, the output layer generates some value a[2], which is a real number that equals the value of ŷ. The hidden layer and output layer are associated with the parameters w and b. In order to compute the outputs (a) of the neural network, which is a sigmoid function of z (σ(z)), it is similar to operating repeated logistic regression. The calculations are shown in Eqs. 1.1 through 1.4. Besides the sigmoid function, other activation functions can be used to compute the hidden layer values. In modern neural networks, the default recommendation is to use hyperbolic tangent (tanh) or the rectified linear unit (ReLU).
Figure 1.1 Structure of a single hidden layer neural network.
In recent years, the ML community has determined that some cases can only be learned using DNNs rather than the single hidden layer neural networks (Hinton et al. 2006). DNNs with multiple hidden layers can use earlier layers to learn about low‐level simpler features and then use the later deeper layers to detect more complex features. Compared with the shallower neural networks, DNNs require significantly less hidden units to compute. Although for any given problem it can be hard to predict in advance exactly how deep a neural network should be, the number of hidden layers can be treated as a hyperparameter and be evaluated by holding out cross‐validation data. The DNN relies on both forward propagation and backpropagation. The forward propagation allows the input to provide the initial information and to propagate up to the subsequent layers, while the backpropagation allows the information to flow backward from the cost to compute the gradient more efficiently. Figure 1.2 summarizes the calculation of a DNN with four hidden layers using both forward and backward propagation. Figure 1.3 shows the structures of some typical neural networks that are useful in the chemical engineering field.
Figure 1.2 Calculation of a deep neural network with four hidden layers.
Figure 1.3 Structures of some typical neural networks.
Accurate chemical property values are extremely important to process safety, industrial hygiene, and novel chemical development. Experimental measurement is the most commonly used method to determine property/toxicity values (Jiao et al. 2019a). However, the experimental setup for property measurement is costly, and most of the chemicals are highly flammable and toxic, which is extremely dangerous for conducting the experiments. For chemical mixture property predictions, mixtures have various combination patterns, making it a highly time‐consuming task to measure all mixture combinations (Jiao et al. 2019b).
Quantitative structure–activity/property relationship (QSAR/QSPR) analysis involves regression and classification models and is widely used in biological and pharmaceutical science and engineering (Verma et al. 2010; Quintero et al. 2012). It also has been extensively used in chemical health and safety research recently due to its high prediction accuracy and reliability (Wang et al. 2017; Zhou et al. 2017; Wang et al. 2019). In addition, it is the area in which ML tools have been most extensively applied to assist in model development. This is because QSAR/QSPR has a well‐developed data pipeline, which can serve to facilitate the development of ML‐based QSAR/QSPR studies. For property predictions, the main target properties are lower flammability limit (LFL), upper flammability limit (UFL), auto‐ignition temperature (AIT), and flash point (FP). There are also other properties that have been investigated such as minimum ignition energy (MIE) and self‐accelerating decomposition temperature (SADT), but only limited studies are available due to the available data. More details will be discussed in Chapter 3.
For consequence analysis, the current mainstream method is computational fluid dynamics (CFD) modeling (Jiao et al. 2019c; Mi et al. 2020; Shen et al. 2020). With the development of ML algorithms, the ANN has been widely used in consequence prediction including gas dispersion and source terms estimation. It could also be integrated with other dispersion models such as PHAST or FLACS to overcome the limitations of missing source information in emergency response cases (Ma et al. 2020).
Since 2010, DL has gained much popularity due to its excellent accuracy when trained with a large amount of data from the dispersion model. Ni et al. (2019) introduced the deep belief networks (DBNs) and convolution neural networks (CNNs) to solve the conflict between accuracy and efficiency requirements of the gas dispersion model. Qian et al. (2019) proposed a specially designed long short‐term memory (LSTM) model for gas dispersion in the real environment. The dropout technique was used to prevent overfitting and to improve the generalization ability of the model. There are also studies working to implement the concept of QSAR/QSPR into consequence analysis modeling by using parameters in consequence modeling as property descriptors and make the consequence values as target variables. Sun et al. (2019) and Jiao et al. (2020b, 2021) used PHAST simulation to construct consequence databases for fire radiation distance and flammable dispersion and used them to train the model to develop a quantitative property–consequence relationship (QPCR) model, which can efficiently predict the corresponding consequence results. More details will be discussed in Chapter 4.
Asset integrity management (AIM) plays a crucial role in ensuring process safety and risk management. AIM includes structural health of the asset and risk management (Khan et al. 2021). AIM integrates technology, process, and personnel and improves system operation, design, and reliability through a management system and effective system operation to avoid accidents. In other words, AIM combines several basic principles such as system theory, risk assessment, optimality, and sustainability. It aims to maximize economic benefits by ensuring the safety and reliability of assets, and its core goal is to actively maintain and achieve inherent safety. The risk‐based AIM method includes three major parts: risk‐based inspection (RBI), reliability‐centered maintenance (RCM), and safety integrity levels (SILs). RBI is a systematic risk‐based evaluation method, which assesses the risk by confirming the damage mechanism of an asset and the consequences caused by the failure (Vinod et al. 2014; Rachman and Ratnayake 2019). Risk is then effectively managed through targeted material selection, corrosion management, preventive testing, and process monitoring. After that, an inspection plan (e.g. maintenance method, maintenance position, maintenance time) can be determined. This can help a company minimize risk and maintenance costs. RCM refers to maintaining the inherent reliability and safety of the equipment using the fewest resources, using decision methods to determine preventive maintenance requirements of equipment. Its main tasks are to analyze the functions and failures of the system. Moreover, on‐site failure data statistics, expert evaluation, and quantitative modeling are utilized to optimize the maintenance strategy of the system. Safety integrity level (SIL) is proposed in IEC 61508, which focuses on measuring the confidence that a system (e.g. safety instrumentation system (SIS)) can expect to perform its safety functions (Deshpande and Modak 2002). It is an index that measures the importance of safety instrumented functions (SIF). These three components (e.g. RBI, RCM, and SIL) can facilitate safety evaluations for different types of equipment in process systems and rank the safety of each equipment to generate a targeted maintenance plan. AIM can help to reduce the overall operation costs of the equipment and improve equipment efficiency and productivity. More details will be discussed in Chapter 5.
Fault detection and diagnosis are crucial for safe operation of process systems. Continuous operation, and thus economic and production objectives of a plant, can be achieved by accurately detecting fault conditions and their proper diagnoses before the faults occur that lead to failures. Many fault detection and diagnosis approaches have been developed over the years, and these methods are generally classified as analytical model‐based, knowledge‐based, and data‐driven approaches. More details will be discussed in Chapter 6.
Inspired by natural animal and human olfactory, researchers proposed the concept of an “electronic nose (EN)” or “artificial olfactory system (AOS).” The multidimensional sensor array and signal process modules coupled with pattern recognition algorithms constitute the AOS. A single gas sensor may respond to various gases, but the response of one sensor to certain gases has specific features. Therefore, the multidimensional response signals captured from the sensor array can provide more information for gas recognition compared with single sensors. Moreover, it can identify the gases qualitatively and quantitatively. The cross‐response of a single metal oxide semiconductor (MOS) sensor can be overcome with AOS. Many methods based on the AOS were proposed, such as principal component analysis (PCA), linear discriminant analysis (LDA), discriminant factor analysis (DFA), partial least square (PLS), principal component regression (PCR), support vector machine (SVM), and cluster analysis (CA). In addition, some ANN and deep learning models (DLM) also were used for gas recognition. More details will be discussed in Chapter 7.
The tremendous success of ML algorithms for image recognition tasks led to a rapid rise in the potential use of ML in various medical imaging tasks, such as risk assessment, detection, diagnosis, prognosis, and therapy response, as well as in multi‐omics disease discovery (Giger 2018). Specifically, traditional methods to diagnose pulmonary diseases involve costly and invasive procedures such as X‐ray screening and bronchoscopes. Thus, it is imperative and beneficial to detect the obstruction locations of peripheral lung lesions precisely with noninvasive diagnostic methods. More details will be discussed in Chapter 8.
Engineered nanomaterials (ENMs) have been utilized in a variety of industrial applications such as cosmetics, therapeutics, electronics, manufacturing, and healthcare (Bao et al. 2013; Thiruvengadam et al. 2018; Siddiquee et al. 2019; Sahoo et al. 2021). The current body of toxicity knowledge for each ENM typically spans a multitude of studies each examining a limited cross‐section of attributes in a given experimental system. Nanotechnology is therefore in need of predictive methods (e.g. fundamental first‐principle models and QSARs) to identify and quantify physicochemical properties of ENMs. More details will be discussed in Chapter 9.
Environmental exposure assessment seeks to quantify exposure to potentially toxic environmental stressors, especially airborne pollutants. Air pollution kills millions of people each year and is a substantial threat to animals and the environment (Chuang et al. 2011; Kim et al. 2015). The long‐term health consequences of pollution exposure can be difficult to infer, but understanding its impact on population health is essential for mitigating negative outcomes. The harmful and even fatal consequences of human exposure to potentially toxic stressors render experimental studies unethical, so the health effects of exposure are usually assessed retrospectively using observational data. More details will be discussed in Chapter 10.
Air pollution is a global concern due to its significant effects on human health, agriculture and ecosystems, and even climate (Myhre et al. 2013; Fuhrer et al. 2016; Cohen et al. 2017). Over the past decades, observing the atmospheric compositions and pollutants using various instrumental platforms has become an increasingly powerful tool for understanding atmospheric processes and air quality. The platforms include earth observation satellite instruments, ground‐based in situ remote sensing stations, and instrumented aircrafts (Laj et al. 2009). Many species, such as particulate matter (PM), nitrogen dioxide (NO2), sulfur dioxide (SO2), carbon monoxide (CO), ozone (O3), lead, and volatile organic compounds (VOCs), can be directly used in air quality studies. These studies include monitoring that is nearly in real time, source apportionment, dispersion modeling, and air quality prediction (Maykut et al. 2003; Engel‐Cox et al. 2004; Cai et al. 2010; Li et al. 2011; Iskandaryan et al. 2020). More details will be discussed in Chapter 11.
R is a language and operating environment for statistical analysis and ML. R is a branch of the S language that was created sometime close to 1980 and is widely used in the statistical field. R is a free and open‐source software belonging to the General Public License (GNU) system and is an excellent tool for statistical calculation ML model construction (R Core Team 2013).
The use of the R language is largely aided by various R packages. To some extent, R packages are plug‐ins for R. Different plug‐ins meet different needs. CRAN (Comprehensive R Archive Network) has already included more than 17,000 packages of various types including many popular ML/DL packages. Some of these packages are summarized as follows:
Machine Learning
R Package
Description
Arules (Hahsler et al.
2021
)
Mining Association Rules and Frequent Itemsets: This provides the infrastructure for representing, manipulating, and analyzing transaction data and patterns (frequent itemsets and association rules). It also provides C implementations of the association mining algorithms Apriori and Eclat.
Caret (Kuhn
2021
)
Classification and Regression Training: This consists of miscellaneous functions for training and plotting classification and regression models.
CORElearn (Robnik‐Sikonja and Savicky
2021
)
Classification, Regression, and Feature Evaluation: This contains several learning techniques for classification and regression. Predictive models include classification and regression trees with optional constructive induction and models in the leaves, random forests, k‐nearest neighbors algorithm (kNN), naive Bayes, and locally weighted regression.
DataExplorer (Cui
2020
)
Automate Data Exploration and Treatment: This automates data exploration processes for analytic tasks and predictive modeling.
dplyr (Wickham et al.
2021a
)
A Grammar of Data Manipulation: This is a fast, consistent tool for working with data frame‐like objects, both in memory and out of memory.
e1071 (Meyer et al.
2021
)
Misc Functions of the Department of Statistics, Probability Theory Group: This functions for latent class analysis, short‐time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes classifier, and generalized k‐nearest neighbor.
gbm (Greenwell et al.
2020
)
Generalized Boosted Regression Models: This is an implementation of extensions to Freund and Schapire's AdaBoost algorithm and Friedman's gradient boosting machine.
ggplot2 (Wickham
2016
)
Create Elegant Data Visualizations Using the Grammar of Graphics: This is a system for “declaratively” creating graphics based on “The Grammar of Graphics.”
glmnet (Friedman et al.
2010
)
Lasso and Elastic‐Net Regularized Generalized Linear Models: This involves efficient procedures for fitting the entire lasso or elastic‐net regularization path for linear regression, logistic and multinomial regression models, Poisson regression, Cox model, multiple‐response Gaussian, and the grouped multinomial regression.
kernlab (Karatzoglou et al.
2004
)
Kernel‐Based Machine Learning Lab: This consists of kernel‐based machine learning methods for classification, regression, clustering, novelty detection, quantile regression, and dimensionality reduction.
mboost (Hothorn et al.
2021
)
Model‐Based Boosting: This is a functional gradient descent algorithm (boosting) for optimizing general risk functions utilizing component‐wise (penalized) least‐squares estimates or regression trees as base learners for fitting generalized linear, additive, and interaction models to potentially high‐dimensional data.
mice (van Buuren and Groothuis-Oudshoorn 2011)
Multivariate Imputation by Chained Equations: This consists of multivariate imputation using fully conditional specification (FCS) implemented by the MICE algorithm. Built‐in imputation models are provided for continuous data (predictive mean matching, normal), binary data (logistic regression), unordered categorical data (polytomous logistic regression), and ordered categorical data (proportional odds).
mlr (Casalicchio et al.
2019
)
Machine Learning in R: This is an interface for a large number of classification and regression techniques, including machine‐readable parameter descriptions.
Party (Hothorn et al.
2006
)
A Laboratory for Recursive Partitioning: This is a computational toolbox for recursive partitioning. The core of the package is ctree(), an implementation of conditional inference trees that embed tree‐structured regression models into a well‐defined theory of conditional inference procedures.
Random Forest (Liaw and Wiener
2002
)
Breiman and Cutler's Random Forests for Classification and Regression: This involves classification and regression based on a forest of trees using random inputs.
ROCR (Sing et al.
2005
)
Visualizing the Performance of Scoring Classifiers: ROC graphs, sensitivity/specificity curves, lift charts, and precision/recall plots are popular examples of trade‐off visualizations for specific pairs of performance measures.
rpart (Therneau and Atkinson
2019
)
Recursive Partitioning and Regression Trees: Recursive partitioning for classification, regression, and survival trees involve implementation of most of the functionality of the 1984 book by Breiman, Friedman, Olshen, and Stone.
tidyr (Wickham
2021b
)
Tidy Messy Data: This consists of tools to help to create tidy data, where each column is a variable, each row is an observation, and each cell contains a single value.
tm (Feinerer and Hornik
2020
)
Text Mining Package: This is a framework for text mining applications within R.
Xgboost (Chen et al.
2021
)
Extreme Gradient Boosting: Extreme Gradient Boosting is an efficient implementation of the gradient boosting framework. The package includes efficient linear model solver and tree learning algorithms. It supports various objective functions, including regression, classification, and ranking.
Deep Learning
R Package
Description
deepnet (Rong
2014
)
Deep learning toolkit in R: This implements some deep learning architectures and neural network algorithms, including back propagation (BP), restricted Boltzmann machine (RBM), deep belief network (DBN), Deep Autoencoder, etc.
h2o (LeDell et al.
2021
)
R Interface for the “H2O” Scalable Machine Learning Platform: R interface for “H2O” is a scalable open‐source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms. It includes generalized linear models (GLM), gradient boosting machines (including XGBoost), random forests, deep neural networks (deep learning), stacked ensembles, naive Bayes, generalized additive models (GAM), Cox proportional hazards, K‐means, PCA, Word2Vec, as well as a fully automatic machine learning algorithm (H2O AutoML).
keras (Allaire and Chollet
2021
)
R Interface to the Keras Deep Learning Library: This provides a consistent interface to the “Keras” Deep Learning Library directly from within R.
nnet (Venables and Ripley
2002
)
Feed‐Forward Neural Networks and Multinomial Log‐Linear Models: This is software for feed‐forward neural networks with a single hidden layer and for multinomial log‐linear models.
neuralnet (Fritsch et al.
2019
)
Training of Neural Networks: This involves training of neural networks using backpropagation, resilient backpropagation with or without weight backtracking, or the modified globally convergent version
rnn (Quast
2016
)
Recurrent Neural Network: This involves implementation of recurrent neural network architectures in native R, including Long ShortTerm Memory (Hochreiter and Schmidhuber), Gated Recurrent Unit (Chung et al), and vanilla RNN.
Tensorflow (Allaire and Tang
2021
)
R Interface to “TensorFlow”: This is an interface to “TensorFlow” <
https://www.tensorflow.org/
>, an open‐source software library for numerical computation using data flow graphs.
Python is a cross‐platform, general‐purpose programming language that was developed by Guido van Rossum and first released in 1991. Python’s design philosophy emphasizes code readability with its notable use of significant whitespace. It is a high‐level scripting language that combines explanatory, compliable, interactive, and object‐oriented features. With the continuous version updates and addition of new language features, it is becoming increasingly utilized for the development of independent and large‐scale projects. It is also the default tool for beginners as well as professionals to learn and use ML/DL algorithms (Jiao et al. 2019c).
Compared to proprietary software such as MATLAB, using open‐source programming languages such as Python for ML/DL model development has some important advantages. First, MATLAB is a costly proprietary software. Python, on the other hand, is free, and many open‐source ML/DL and scientific computing libraries provide Python calling interfaces. In addition to some highly specialized toolboxes of MATLAB that cannot be replaced, most of the commonly used functions of MATLAB can be found in Python. Users can install Python and most of its extension libraries on any computer for free, and Python also provides a state‐of‐art ML/DL library that can easily complete various advanced tasks and achieve superior performance. In addition, compared to MATLAB, Python is a programming language that is easier to learn and more rigorous that can make composed code easier for users to write, read, and maintain.
Python package
Description
Caffe (Jia et al.
2014
)
Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by Berkeley AI Research (BAIR) and by community contributors.
Keras (Chollet
2015
)
Keras is a deep learning API written in Python, running in conjunction with the machine learning platform TensorFlow. It is developed with a focus on enabling fast experimentation.
Matplotlib (Hunter
2007
)
Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.
NumPy (Harris et al.
2020
)
NumPy involves a powerful N‐dimensional array object as well as useful linear algebra, Fourier transform, and random number capabilities.
Pandas (McKinney
2010
)
Pandas is a Python package that provides fast, flexible, and expressive data structures designed to make working with structured (tabular, multidimensional, potentially heterogeneous) and time series data both simple and intuitive.
Pytorch (Paszke et al.
2019
)
PyTorch is a Python package that provides two high‐level features: Tensor computation (like NumPy) with strong GPU acceleration and deep neural networks built on a tape‐based autograd system.
Scikit‐Learn (Pedregosa et al.
2011
)
Scikit‐Learn is a Python module for machine learning built on top of SciPy and is distributed under the 3‐Clause BSD license.
SciPy (Virtanen et al.
2020
)
SciPy is an open‐source software for mathematics, science, and engineering. The SciPy library depends on NumPy, which provides convenient and fast N‐dimensional array manipulation. The SciPy library is built to work with NumPy arrays, and provides many user‐friendly and efficient numerical routines including for numerical integration and optimization.
Tensorflow (Abadi et al.
2016
)
TensorFlow is an open‐source software library for high‐performance numerical computation. Its flexible architecture allows easy deployment of computation capabilities across a variety of platforms.
Abadi, M., Barham, P., Chen, J. et al. (2016). Tensorflow: a system for large‐scale machine learning.
12th Symposium on Operating Systems Design and Implementation
, 16, pp. 265–283.
Allaire, J.J. and Chollet, F. (2021a). keras: R Interface to 'Keras'. R package version 2.4.0.
https://CRAN.R‐project.org/package=keras
.
Allaire, J.J. and Tang, Y. (2021). tensorflow: R Interface to 'TensorFlow'. R package version 2.5.0.
https://CRAN.R‐project.org/package=tensorflow
.
Andrieu, C., Freitas, N., Doucet, A., and Jordan, M.I. (2003). An introduction to MCMC for machine learning.
Mach. Learn.
50: 5–43.
Bakator, M. and Radosav, D. (2008). Deep learning and medical diagnosis: a review of literature.
Multimodal Technol. Interact.
2 (3): 47.
Bao, G., Mitragotri, S., and Tong, S. (2013). Multifunctional nanoparticles for drug delivery and molecular imaging.
Annu. Rev. Biomed. Eng.
15: 253–282.
Cai, C., Geng, F., Tie, X. et al. (2010). Characteristics and source apportionment of VOCs measured in Shanghai, China.
Atmos. Environ.
44 (38): 5005–5014.
Cambria, E. and White, B. (2014). Jumping NLP curves: a review of natural language processing research.
IEEE Comput. Intell. Mag.
9 (2): 48–57.
Casalicchio, G., Bossek, J., Lang, M. et al. (2019). OpenML: an R package to connect to the machine learning platform OpenML.
Comput. Stat.
34: 977–991.
Chaki, J., Dey, N., Shi, F., and Sherratt, R.S. (2019). Pattern mining approaches used in sensor‐based biometric recognition: a review.
IEEE Sensors J.
19 (10): 3569–3580.
Chen, T., He, T., Benesty, M. et al. (2021). xgboost: Extreme Gradient Boosting. R package version 1.4.1.1.
Chollet, F. (2015). Keras. GitHub.
https://github.com/fchollet/keras
.
Chong, E., Han, C., and Park, F.C. (2017). Deep learning networks for stock market analysis and prediction: methodology, data representations, and case studies.
Expert Syst. Appl.
83: 187–205.
Chuang, K.J., Yan, Y.H., Chiu, S.Y. et al. (2011). Long‐term air pollution exposure and risk factors for cardiovascular diseases among the elderly in Taiwan.
Occup. Environ. Med.
68 (1): 64–68.
Cohen, A.J., Brauer, M., Burnett, R. et al. (2017). Estimates and 25‐year trends of the global burden of disease attributable to ambient air pollution: an analysis of data from the Global Burden of Diseases Study 2015.
Lancet
389 (10082): 1907–1918.
Cui, B. (2020). DataExplorer: Automate Data Exploration and Treatment. R package version 0.8.2.
https://CRAN.R‐project.org/package=DataExplorer
.
Deshpande, V.S. and Modak, J.P. (2002). Application of RCM to a medium scale industry.
Reliab. Eng. Syst. Saf.
77: 31–43.
Engel‐Cox, J.A., Hoff, R.M., and Haymet, A.D. (2004). Recommendations on the use of satellite remote‐sensing data for urban air quality.
J. Air Waste Manage. Assoc.
54: 1360–1371.
Feinerer, I. and Hornik, K. (2020). tm: Text Mining Package. R package version 0.7‐8.
https://CRAN.R‐project.org/package=tm
.
Friedman, J., Hastie, T., and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent.
J. Stat. Softw.
33 (1): 1–22.
https://www.jstatsoft.org/v33/i01
.
Fritsch, S., Guenther, F., and Wright, M. (2019). neuralnet: Training of Neural Networks. R package version 1.44.2.
https://CRAN.R‐project.org/package=neuralnet
.
Fuhrer, J., Val Martin, M., Mills, G. et al. (2016). Current and future ozone risks to global terrestrial biodiversity and ecosystem processes.
Ecol. Evol.
6 (24): 8785–8799.
Giger, M. (2018). Machine learning in medical imaging.
J. Am. Coll. Radiol.
15: 512–520.
Greenwell, B., Boehmke, B., Cunningham, J., and GBM developers (2020). gbm: Generalized Boosted Regression Models. R package version 2.1.8.
https://CRAN.R‐project.org/package=gbm
.
Hahsler, M., Buchta, C., Gruen, B., and Hornil, K. (2021). arules: Mining Association Rules and Frequent Itemsets. R package version 1.6‐8.
https://CRAN.R‐project.org/package=arules
Harris, C.R., Millman, K.J., van der Walt, S.J. et al. (2020). Array programming with NumPy.
Nature
585: 357–362.
Hinton, G.E., Osindero, S., and Teh, Y. (2006). A fast learning algorithm for deep belief nets.
Neural Comput.
18: 1527–1554.
Hothorn, T., Hornik, K., and Zeileis, A. (2006). Unbiased recursive partitioning: a conditional inference framework.
J. Comput. Graph. Stat.
15 (3): 651–674.
Hothorn, T., Buehlmann, P., Kneib, T. et al. (2021). mboost: Model‐Based Boosting, R package version 2.9‐5,
https://CRAN.R‐project.org/package=mboost
.
Hunter, J.D. (2007). Matplotlib: a 2D graphics environment.
Comput. Sci. Eng.
9 (3): 90–95.
Iskandaryan, D., Ramos, F., and Trilles, S. (2020). Air quality prediction in smart cities using machine learning technologies based on sensor data: a review.
Appl. Sci.
10 (7): 2401.
James, G., Witten, D., Hastie, T., and Tibshirani, R. (2017).
An Introduction to Statistical Learning: with Applications in R
. New York: Springer.
Jia, Y., Shelhamer, E., Donahue, J. et al. (2014). Caffe: Convolutional architecture for fast feature embedding.
Proceedings of ACM Multimedia
, pages 675–678.
Jiao, Z., Escobar‐Hernandez, H.U., Parker, T., and Wang, Q. (2019a). Review of recent developments of quantitative structure‐property relationship models on fire and explosion‐related properties.
Process Saf. Environ. Prot.
129: 280–290.
Jiao, Z., Yuan, S., Zhang, Z., and Wang, Q. (2019b). Machine learning prediction of hydrocarbon mixture lower flammability limits using quantitative structure‐property relationship models.
Process. Saf. Prog.
39 (2): e12103.
Jiao, Z., Yuan, S., Ji, C. et al. (2019c). Optimization of dilution ventilation layout design in confined environments using Computational Fluid Dynamics (CFD).
J. Loss Prev. Process Ind.
60: 195–202.
Jiao, Z., Hu, P., Xu, H., and Wang, Q. (2020a). Machine learning and deep learning in chemical health and safety: a systematic review of techniques and applications.
ACS J. Chem. Health Saf.
27 (6): 316–334.
Jiao, Z., Sun, Y., Hong, Y. et al. (2020b). Development of flammable dispersion quantitative property‐consequence relationship (QPCR) models using extreme gradient boosting.
Ind. Eng. Chem. Res.
59 (33): 15109–15118.
Jiao, Z., Ji, C., Sun, Y. et al. (2021). Deep learning based quantitative property‐consequence relationship (QPCR) models for toxic dispersion prediction.
Process Saf. Environ. Prot.
152: 352–360.
Jordan, M.I. and Mitchell, T.M. (2015). Machine learning: Trends, perspectives, and prospects.
Science
349 (6245): 255–260.
Karatzoglou, A., Smola, A., Hornik, K., and Zeileis, A. (2004). kernlab ‐ an S4 package for kernel methods in R.
J. Stat. Softw.
11 (9): 1–20.
http://www.jstatsoft.org/v11/i09
.
Khan, F., Yarveisy, R., and Abbassi, R. (2021). Risk‐based pipeline integrity management: a road map for the resilient pipelines.
J. Pipeline Sci. Eng.
1 (1): 74–87.
Kim, K.H., Kabir, E., and Kabir, S. (2015). A review on the human health impact of airborne particulate matter.
Environ. Int.
74: 136–143.
Kuhn, M. (2021). caret: Classification and Regression Training. R package version 6.0‐88.
https://CRAN.R‐project.org/package=caret
.
Laj, P., Klausen, J., Bilde, M. et al. (2009). Measuring atmospheric composition change.
Atmos. Environ.
43: 5351–5414.
Lecun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning.
Nature
521 (7553): 436–444.
LeDell, E., Gill, N., Aiello, S. et al. (2021). h2o: R Interface for the 'H2O' Scalable Machine Learning Platform. R package version 3.32.1.3.
https://CRAN.R‐project.org/package=h2o
.
Lee, Y., Buchanan, B.G., Mattison, D.M. et al. (1995). Learning rules to predict rodent carcinogenicity of non‐genotoxic chemicals.
Mutat. Res. Fundam. Mol. Mech. Mutagen.
328: 127–149.
Li, C., Hsu, N.C., and Tsay, S.‐C. (2011). A study on the potential applications of satellite data in air quality monitoring and forecasting.
Atmos. Environ.
45: 3663–3675.
Liaw, A. and Wiener, M. (2002). Classification and regression by RandomForest.
R News.
2 (3): 18–22.
Ma, D., Gao, J., Zhang, Z. et al. (2020). Locating the gas leakage source in the atmosphere using the dispersion wave method.
J. Loss Prev. Process Ind.
63: 104031.
Maykut, N.N., Lewtas, J., Kim, E., and Larson, T.V. (2003). Source apportionment of PM2. 5 at an urban IMPROVE site in Seattle Washington.
Environ. Sci. Technol.
37 (22): 5135–5142.
McCulloch, W.S. and Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity.
Bull. Math. Biol.
5: 115–133.
McKinney, W. (2010). Data structures for statistical computing in python.
Proceedings of the 9th Python in Science Conference
Vol. 445, pp. 51–56.
Meyer, D., Dimitriadou, E., Hornik, K. et al. (2021). e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien. R package version 1.7‐8.
https://CRAN.R‐project.org/package=e1071
.
Mi, H., Liu, Y., Jiao, Z. et al. (2020). A numerical study on the optimization of ventilation mode during emergency of cable fire in utility tunnel.
Tunn. Undergr. Space Technol.
100: 103403.
Mitchell, T.M. (1999). Machine learning and data mining.
Commun. ACM
42 (11): 30–36.
Modi, K. and Dayma, R. (2017). Review on fraud detection methods in credit card transactions.
2017 International Conference on Intelligent Computing and Control (I2C2)
.
Myhre, G., Shindell, D., and Pongratz, J. (2013). Anthropogenic and natural radiative forcing. In:
Climate Change 2013: The Physical Science Basis. Contribution of Working Group I to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change
(ed. T.F. Stocker, D. Qin, G.‐K. Plattner, et al.). Cambridge, United Kingdom and New York, NY, USA: Cambridge University Press.
Nassif, A.B., Shahin, I., Attili, I. et al. (2019). Speech recognition using deep Neural networks: a systematic review.
IEEE Access.
7: 19143–19165.
Ni, J., Yang, H., Yao, J. et al. (2019). Toxic gas dispersion prediction for point source emission using deep learning method.
Hum. Ecol. Risk Assess. Int. J.
26: 1–14.