Biomedical Data Mining for Information Retrieval -  - E-Book

Biomedical Data Mining for Information Retrieval E-Book

0,0
207,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

BIOMEDICAL DATA MINING FOR INFORMATION RETRIEVAL This book not only emphasizes traditional computational techniques, but discusses data mining, biomedical image processing, information retrieval with broad coverage of basic scientific applications. Biomedical Data Mining for Information Retrieval comprehensively covers the topic of mining biomedical text, images and visual features towards information retrieval. Biomedical and health informatics is an emerging field of research at the intersection of information science, computer science, and healthcare and brings tremendous opportunities and challenges due to easily available and abundant biomedical data for further analysis. The aim of healthcare informatics is to ensure the high-quality, efficient healthcare, better treatment and quality of life by analyzing biomedical and healthcare data including patient's data, electronic health records (EHRs) and lifestyle. Previously, it was a common requirement to have a domain expert to develop a model for biomedical or healthcare; however, recent advancements in representation learning algorithms allows us to automatically to develop the model. Biomedical image mining, a novel research area, due to the vast amount of available biomedical images, increasingly generates and stores digitally. These images are mainly in the form of computed tomography (CT), X-ray, nuclear medicine imaging (PET, SPECT), magnetic resonance imaging (MRI) and ultrasound. Patients' biomedical images can be digitized using data mining techniques and may help in answering several important and critical questions relating to healthcare. Image mining in medicine can help to uncover new relationships between data and reveal new useful information that can be helpful for doctors in treating their patients. Audience Researchers in various fields including computer science, medical informatics, healthcare IOT, artificial intelligence, machine learning, image processing, clinical big data analytics.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 659

Veröffentlichungsjahr: 2021

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

Title Page

Copyright

Preface

Introduction

Organization of the Book

Concluding Remarks

1 Mortality Prediction of ICU Patients Using Machine Learning Techniques

1.1 Introduction

1.2 Review of Literature

1.3 Materials and Methods

1.4 Result and Discussion

1.5 Conclusion

1.6 Future Work

References

2 Artificial Intelligence in Bioinformatics

2.1 Introduction

2.2 Recent Trends in the Field of AI in Bioinformatics

2.3 Data Management and Information Extraction

2.4 Gene Expression Analysis

2.5 Role of Computation in Protein Structure Prediction

2.6 Application in Protein Folding Prediction

2.7 Role of Artificial Intelligence in Computer-Aided Drug Design

2.8 Conclusions

References

3 Predictive Analysis in Healthcare Using Feature Selection

3.1 Introduction

3.2 Literature Review

3.3 Dataset Description

3.4 Feature Selection

3.5 Feature Selection Methods

3.6 Methodology

3.7 Experimental Results and Analysis

3.8 Conclusion

References

4 Healthcare 4.0: An Insight of Architecture, Security Requirements, Pillars and Applications

4.1 Introduction

4.2 Basic Architecture and Components of e-Health Architecture

4.3 Security Requirements in Healthcare 4.0

4.4 ICT Pillar’s Associated With HC4.0

4.5 Healthcare 4.0’s Applications-Scenarios

4.6 Conclusion

References

5 Improved Social Media Data Mining for Analyzing Medical Trends

5.1 Introduction

5.2 Literature Survey

5.3 Basic Data Mining Clustering Technique

5.4 Research Methodology

5.5 Results and Discussion

5.6 Conclusion & Future Scope

References

6 Bioinformatics: An Important Tool in Oncology

6.1 Introduction

6.2 Cancer—A Brief Introduction

6.3 Bioinformatics—A Brief Introduction

6.4 Bioinformatics—A Boon for Cancer Research

6.5 Applications of Bioinformatics Approaches in Cancer

6.6 Bioinformatics: A New Hope for Cancer Therapeutics

6.7 Conclusion

References

7 Biomedical Big Data Analytics Using IoT in Health Informatics

7.1 Introduction

7.2 Biomedical Big Data

7.3 Healthcare Internet of Things (IoT)

7.4 Studies Related to Big Data Analytics in Healthcare IoT

7.5 Challenges for Medical IoT & Big Data in Healthcare

7.6 Conclusion

References

8 Statistical Image Analysis of Drying Bovine Serum Albumin Droplets in Phosphate Buffered Saline

8.1 Introduction

8.2 Experimental Methods

8.3 Results

8.4 Discussions

8.5 Conclusions

Acknowledgments

References

9 Introduction to Deep Learning in Health Informatics

9.1 Introduction

9.2 Deep Learning in Health Informatics

9.3 Medical Informatics

9.4 Bioinformatics

9.5 Pervasive Sensing

9.6 Public Health

9.7 Deep Learning Limitations and Challenges in Health Informatics

References

10 Data Mining Techniques and Algorithms in Psychiatric Health: A Systematic Review

10.1 Introduction

10.2 Techniques and Algorithms Applied

10.3 Analysis of Major Health Disorders Through Different Techniques

10.4 Conclusion

References

11 Deep Learning Applications in Medical Image Analysis

11.1 Introduction

11.2 Deep Learning Models and its Classification

11.3 Convolutional Neural Networks (CNN)—A Popular Supervised Deep Model

11.4 Deep Learning Advancements—A Biological Overview

11.5 Conclusion and Discussion

References

12 Role of Medical Image Analysis in Oncology

12.1 Introduction

12.2 Cancer

12.3 Medical Imaging

12.4 Diagnostic Approaches for Cancer

12.5 Conclusion

References

13 A Comparative Analysis of Classifiers Using Particle Swarm Optimization-Based Feature Selection

13.1 Introduction

13.2 Feature Selection for Classification

13.3 Use of WEKA Tool

13.4 Conclusion and Future Work

References

Index

End User License Agreement

Guide

Cover

Table of Contents

Title page

Copyright

Preface

Begin Reading

Index

End User License Agreement

List of Illustrations

Chapter 1

Figure 1.1 Step by step process for mortality prediction.

Figure 1.2 The FLANN based mortality prediction model.

Figure 1.3 Convergence characteristics of FA-FLANN based mortality prediction mo...

Chapter 2

Figure 2.1 The different level of organization of protein.

Chapter 3

Figure 3.1 Diabetes dataset class distribution.

Figure 3.2 Class distribution of hepatitis dataset.

Figure 3.3 Flow chart of the tasks carried out in this chapter.

Chapter 4

Figure 4.1 Basic architecture and components of e-health architecture.

Figure 4.2 Healthcare 4.0 protection and security necessities.

Chapter 5

Figure 5.1 Steps involved in Data mining process. http://www.lastnightstudy.com/...

Figure 5.2 Components of data mining system. https://www.ques10.com/p/9209/expla...

Figure 5.3 Major Social media sites. https://www.securitymagazine.com/articles/8...

Figure 5.4 Social network representation using graph. https://www.javatpoint.com...

Figure 5.5 An example of clustering. https://www.analyticsvidhya.com/blog/2013/1...

Figure 5.6 Partition clustering [18].

Figure 5.7 Hierarchical clustering [19].

Figure 5.8 Obstacle in Constraint-Based Clustering [42].

Figure 5.9 Decision tree [42].

Figure 5.10 Categorization of social media data [43].

Figure 5.11 Frame work for proposed system.

Figure 5.12 Basic steps of page rank algorithm. https://www.analyticsvidhya.com/...

Figure 5.13 Output generated after Pre processing step. Stop Word algorithm has ...

Figure 5.14 (a) Pre processing; (b) Graph clustering.

Figure 5.15 Apply k-mean algorithm.

Figure 5.16 Generation of clustering data.

Figure 5.17 Output of K-means algorithm.

Figure 5.18 Apply back propagation algorithm.

Figure 5.19 Result of clustering.

Figure 5.20 Classified data.

Figure 5.21 Performance comparison of proposed algorithm and other existing meth...

Figure 5.22 Execution time.

Chapter 6

Figure 6.1 Bioinformatics tools used in cancer research.

Figure 6.2 Cancer genomic databases based on bioinformatics.

Figure 6.3 Procedure of SELDI-MS-TOF based on bioinformatics.

Chapter 7

Figure 7.1 Big Data: 6 V’s.

Figure 7.2 Google research trend IoT Health and Big Data Health.

Figure 7.3 Big Data Flow from its sources to storage, analytics, and visualizati...

Figure 7.4 Healthcare IoT system architecture

Figure 7.5 Healthcare IoT monitoring architecture [10].

Chapter 8

Figure 8.1 BSA saline droplets at different initial PBS concentrations (Ø): The ...

Figure 8.2 (I–IV) shows the time evolution of the first-order statistical (FOS) ...

Figure 8.3 (I–IV) show the comparison of the averaged gray level co-occurrence m...

Figure 8.4 (I–IV) show the time evolution of the gray level co-occurrence matrix...

Figure 8.5 The images of the dried films of BSA-saline are captured after 24 h a...

Figure 8.6 Histograms depicting the counts of the pixels along the y-axis, and t...

Chapter 9

Figure 9.1 Neural network architecture.

Figure 9.2 Basic architecture of DNN.

Figure 9.3 Basic structure of CNN.

Figure 9.4 Basic structure of DBN.

Figure 9.5 Basic architecture of RNN.

Figure 9.6 Basic architecture of DA.

Figure 9.7 Summary of the proposed model.

Chapter 10

Figure 10.1 Percentages of data mining techniques applied to Alzheimer’s studies...

Figure 10.2 Percentages of data mining techniques applied to Dementia’s studies ...

Figure 10.3 Percentages of data mining techniques applied to depression studies ...

Figure 10.4 Percentages of data mining techniques applied to Schizophrenia and b...

Chapter 11

Figure 11.1 Image depicting prostate cancer cell segmentation (Source—Alan Parti...

Figure 11.2 Basic neural network design.

Figure 11.3 Various neural network architectures. (a) Recurrent Neural network, ...

Figure 11.4 An illustration of a typical CNN architecture for pixel RGB images (...

Figure 11.5 (a) Input Image, (b) Convolutional filter and (c) Convolved output v...

Figure 11.6 Average versus max pooling with a stride of 2.

Figure 11.7 Basic CNN architecture for image classification [55].

Figure 11.8 Image denoising of brain MR image by DnCNN network [26].

Chapter 12

Figure 12.1 TNM system of staging.

Figure 12.2 Factors affecting cancer prognosis.

Figure 12.3 Levels of medical imaging.

Figure 12.4 General steps of image processing.

Figure 12.5 Artificial intelligence tools are used for detection, characterizati...

Chapter 13

Figure 13.1 Data mining process.

Figure 13.2 Generic model of classification.

Figure 13.3 Filter approach.

Figure 13.4 Wrapper approach.

Figure 13.5 Snapshot of classifiers in WEKA.

Figure 13.6 Classifiers performance.

List of Tables

Chapter 1

Table 1.1 Time series variables with description and physical units recorded in ...

Table 1.2 Time series variables with physical units [30].

Table 1.3 Comparison of different models during testing.

Chapter 2

Table 2.1 Summary of database sources of protein structure classification.

Chapter 3

Table 3.1 Comparison of Research paper for diabetes dataset.

Table 3.2 Comparison of Research papers on hepatitis dataset.

Table 3.3 PIMA diabetes dataset description.

Table 3.4 Description of Hepatitis dataset’s attributes.

Table 3.5 Difference between filter method and wrapper method.

Table 3.6 Accuracy obtained in Task 1 for diabetes dataset.

Table 3.7 Accuracy obtained in Task 1 for hepatitis dataset.

Table 3.8 Accuracy obtained in Task 2 in diabetes dataset.

Table 3.9 Accuracy obtained in Task 2 in hepatitis dataset.

Table 3.10 Accuracy obtained by filter feature selection methods in diabetes dat...

Table 3.11 Accuracy obtained by wrapper feature selection methods in diabetes da...

Table 3.12 Accuracy obtained by filter feature selection methods in hepatitis da...

Table 3.13 Accuracy obtained in wrapper feature selection methods in the hepatit...

Table 3.14 Accuracy obtained in Task 4 for diabetes dataset.

Table 3.15 Accuracy obtained in Task 4 for hepatitis dataset.

Table 3.16 Conclusion table for diabetes dataset.

Table 3.17 Conclusion table for hepatitis dataset.

Chapter 4

Table 4.1 Healthcare 4.0’s application scenarios.

Chapter 5

Table 5.1 Comparative analysis between proposed algorithm and other existing met...

Table 5.2 Comparative analysis between proposed algorithm and other existing met...

Chapter 6

Table 6.1 Types of biomarkers explained based on their utilization.

Table 6.2 List of the biomarkers using bioinformatics tools.

Table 6.3 The types of microarray explained with illustrations.

Chapter 7

Table 7.1 Reviewed IoT healthcare system.

Chapter 8

Table 8.1 Detailed report of Mann–Whitney U test for ASM (angular second moment)...

Table 8.2 Detailed report of Mann–Whitney U test for COR (correlation) at each c...

Table 8.3 Detailed report of Mann–Whitney U test for ENT (entropy) at each conce...

Table 8.4 Detailed report of Mann–Whitney U test for IDM (inverse difference mom...

Chapter 9

Table 9.1 Classification results for diabetic and non-diabetic patients and corr...

Chapter 10

Table 10.1 Several study researches review in context to data mining techniques ...

Table 10.2 Several study researches review in context to data mining techniques ...

Table 10.3 Several study researches review in context to data mining techniques ...

Table 10.4 Several study researches review in context to data mining techniques ...

Chapter 11

Table 11.1 Advancements in tools of medical imaging.

Table 11.2 Some supplementary advancements in the subcellular and cellular secti...

Table 11.3 Some miscellaneous advancements in the organ section and multidomain ...

Chapter 12

Table 12.1 Comparison between different imaging techniques.

Chapter 13

Table 13.1 WEKA names of selected classifiers.

Table 13.2 Feature section algorithms.

Table 13.3 Features in the dataset.

Table 13.4 Classification accuracy in % with original features.

Table 13.5 Classification accuracy in % after GA-based features selection.

Table 13.6 Classification accuracy in % after PSO-based features selection.

Pages

v

ii

iii

iv

xv

xvi

xvii

xviii

xix

xx

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

409

410

411

412

413

414

415

416

Scrivener Publishing100 Cummings Center, Suite 541JBeverly, MA 01915-6106

Artificial Intelligence and Soft Computing for Industrial Transformation

Series Editor: Dr S. Balamurugan ([email protected])

Scope: Artificial Intelligence and Soft Computing Techniques play an impeccable role in industrial transformation. The topics to be covered in this book series include Artificial Intelligence, Machine Learning, Deep Learning, Neural Networks, Fuzzy Logic, Genetic Algorithms, Particle Swarm Optimization, Evolutionary Algorithms, Nature Inspired Algorithms, Simulated Annealing, Metaheuristics, Cuckoo Search, Firefly Optimization, Bio-inspired Algorithms, Ant Colony Optimization, Heuristic Search Techniques, Reinforcement Learning, Inductive Learning, Statistical Learning, Supervised and Unsupervised Learning, Association Learning and Clustering, Reasoning, Support Vector Machine, Di˙erential Evolution Algorithms, Expert Systems, Neuro Fuzzy Hybrid Systems, Genetic Neuro Hybrid Systems, Genetic Fuzzy Hybrid Systems and other Hybridized So~ Computing Techniques and their applications for Industrial Transformation. The book series is aimed to provide comprehensive handbooks and reference books for the benefit of scientists, research scholars, students and industry professional working towards next generation industrial transformation.

Publishers at ScrivenerMartin Scrivener ([email protected])Phillip Carmical ([email protected])

Biomedical Data Mining for Information Retrieval

Methodologies, Techniques and Applications

Edited by

Sujata Dash,

Subhendu Kumar Pani,

S. Balamurugan

and

Ajith Abraham

This edition first published 2021 by John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA and Scrivener Publishing LLC, 100 Cummings Center, Suite 541J, Beverly, MA 01915, USA© 2021 Scrivener Publishing LLCFor more information about Scrivener publications please visit www.scrivenerpublishing.com.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.

Wiley Global Headquarters111 River Street, Hoboken, NJ 07030, USA

For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.

Limit of Liability/Disclaimer of WarrantyWhile the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials, or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read.

Library of Congress Cataloging-in-Publication Data

ISBN 978-1-119-71124-7

Cover image: Pixabay.ComCover design by Russell Richardson

Set in size of 11pt and Minion Pro by Manila Typesetting Company, Makati, Philippines

Printed in the USA

10 9 8 7 6 5 4 3 2 1

Preface

Introduction

Biomedical Data Mining for Information Retrieval comprehensively covers the topic of mining biomedical text, images and visual features towards information retrieval, which is an emerging research field at the intersection of information science and computer science. Biomedical and health informatics is another remerging field of research at the intersection of information science, computer science and healthcare. This new era of healthcare informatics and analytics brings with it tremendous opportunities and challenges based on the abundance of biomedical data easily available for further analysis. The aim of healthcare informatics is to ensure high-quality, efficient healthcare and better treatment and quality of life by efficiently analyzing biomedical and healthcare data, including patients’ data, electronic health records (EHRs) and lifestyle. Earlier, it was commonly required to have a domain expert develop a model for biomedical or healthcare data; however, recent advancements in representation learning algorithms allow automatic learning of the pattern and representation of given data for the development of such a model. Biomedical image mining is a novel research area brought about by the large number of biomedical images increasingly being generated and stored digitally. These images are mainly generated by computed tomography (CT), X-ray, nuclear medicine imaging (PET, SPECT), magnetic resonance imaging (MRI) and ultrasound. Patients’ biomedical images can be digitized using data mining techniques and may help in answering several critical questions related to their healthcare. Image mining in medicine can help to uncover new relationships between data and reveal new useful information that can aid doctors in treating their patients.

Information retrieval (IR) methods have multiple levels of representation in which the system learns raw to higher abstract level representation at each level. An essential issue in medical IR is the variety of users of different services. In general, they will have changeable categories of information needs, varying levels of medical knowledge and varying language skills. The various categories of users of medical IR systems have multiple levels of medical knowledge, with the medical knowledge of many individuals falling within a category that varies greatly. This influences the way in which individuals present search queries to systems and also the level of complexity of information that should be returned to them or the type of support when considering which retrieved material should be provided. These have shown significant success in dealing with massive data for a large number of applications due to their capability of extracting complex hidden features and learning efficient representation in an unsupervised setting.

This book covers the latest advances and developments in health informatics, data mining, machine learning and artificial intelligence, fields which to a great extent will play a vital role in improving human life. It also covers the IR-based models for biomedical and health informatics which have recently emerged in the still-developing field of research in biomedicine and healthcare. All researchers and practitioners working in the fields of biomedicine, health informatics, and information retrieval will find the book highly beneficial. Since it is a good collection of state-of-the-art approaches for data-mining-based biomedical and health-related applications, it will also be very beneficial for new researchers and practitioners working in the field in order to quickly know what the best performing methods are. With this book they will be able to compare different approaches in order to carry forward their research in the most important areas of research, which directly impacts the betterment of human life and health. No other book on the market provides such a good collection of state-of-the-art methods for mining biomedical text, images and visual features towards information retrieval.

Organization of the Book

The 13 chapters of this book present scientific concepts, frameworks and ideas on biomedical data analytics and information retrieval from the different biomedical domains. The Editorial Advisory Board and expert reviewers have ensured the high caliber of the chapters through careful refereeing of the submitted papers. For the purpose of coherence, we have organized the chapters with respect to similarity of topics addressed, ranging from issues pertaining to the internet of things for biomedical engineering and health informatics, computational intelligence for medical image processing, and biomedical natural language processing.

In Chapter 1, “Mortality Prediction of ICU Patients Using Machine Learning Techniques,” Babita Majhi, Aarti Kashyap and Ritanjali Majhi present a mortality prediction using machine learning techniques. Since the intensive care unit (ICU) admits very ill patients, facilitating their care requires serious attention and treatment using ventilators and other sophisticated medical equipment. This equipment is very costly; hence, its optimized use is necessary. ICUs require a higher number of staff in comparison to the number of patients admitted for regular monitoring. In brief, ICUs involve a larger budget compared to other sections of any hospital. Therefore, to help doctors determine which patient is more at risk, mortality prediction is an important area of research. In data mining, mortality prediction is a binary classification problem, i.e., die or survive. As a result, this has attracted machine learning groups to apply algorithms to do the mortality prediction. In this chapter, six different machine learning methods, functional link artificial neural network (FLANN), support vector machine (SVM), discriminate analysis (DA), decision tree (DT), naïve Bayesian network and K-nearest neighbors (KNN), are used to develop a model for mortality prediction collecting data from PhysioNetChallenge 2012 and did the performance analysis of them.

In Chapter 2, “Artificial Intelligence in Bioinformatics,” V. Samuel Raj, Anjali Priyadarshini, Manoj Kumar Yadav, Ramendra Pati Pandey, Archana Gupta and Arpana Vibhuti emphasize the various smart tools available in the field of biomedical and health informatics. They also analyzed recently introduced state-of-the-art bioinformatics using complex AI algorithms.

In Chapter 3, “Predictive Analysis in Healthcare Using Feature Selection,” Aneri Acharya, Jitali Patel and Jigna Patel describe various methods to enhance the performance of machine learning models used in predictive analysis. The chronic diseases of diabetes and hepatitis are explored in this chapter with an experiment carried out in four tasks.

In Chapter 4, “Healthcare 4.0: An Insight of Architecture, Security Requirements, Pillars and Applications,” Deepanshu Bajaj, Bharat Bhushan and Divya Yadav present the idea of Industry 4.0, which is massively evolving as it is essential for the medical sector, including the internet of things (IoT), big data (BD) and blockchain (BC), the combination of which are modernizing the overall framework of e-health. They analyze the implementation of the I4.0 (Industry 4.0) technology in the medical sector, which has revolutionized the best available approaches and improved the entire framework.

In Chapter 5, “Improved Social Media Data Mining for Analyzing Medical Trends,” Minakshi Sharma and Sunil Sharma discuss social media health records. Nowadays, social media has become a prominent method of sharing and viewing news among the general population. It has become an inseparable part of our lives, with people spending most of their time on social media instead of on other activities. People on media, such as Twitter, Facebook or blogs, share their health records, medication history and personal views. For social media resources to be useful, noise must be filtered out and only the important content must be captured excluding the irrelevant data, depending on the similarities to the social media. However, even after filtering the content, it may contain irrelevant information, so the information should be prioritized based on its estimated importance. Importance can be estimated with the help of three factors: media focus (MF), user attention (UA) and user interaction (UI). In the first factor, media focus is the temporal popularity of that topic in the news. In the second factor, the temporal popularity of a topic on twitter indicates its user attention. In the third factor, the interaction between the social media users on a topic is referred to as the user interaction. It indicates the strength of a topic in social media. Hence, these three factors form the basis of ranking news topics and thus improve the quality and variety of ranked news.

In Chapter 6, “Bioinformatics: An Important Tool in Oncology” Gaganpreet Kaur, Saurabh Gupta, Gagandeep Kaur, Manju Verma and Pawandeep Kaur provide an analysis of comprehensive details of the beginning, development and future perspectives of bioinformatics in the field of oncology.

In Chapter 7, “Biomedical Big Data Analytics Using IoT in Health Informatics,” Pawan Singh Gangwar and Yasha Hasija present are view of healthcare big data analytics and biomedical IoT and aim to describe it. Wearable devices play a major role in various environmental conditions like daily continuous health monitoring of people, weather forecasting and traffic management on roads. Such mobile apps and devices are presently used progressively and are interconnected with telehealth and telemedicine through the healthcare IoT. Enormous quantities of data are consistently generated by such kinds of devices and are stored on the cloud platforms. Such large amounts of biomedical data are periodically gathered by intelligent sensors and transmitted for remote medical diagnostics.

In Chapter 8, “Statistical Image Analysis of Drying Bovine Serum Albumin Droplets in Phosphate Buffered Saline,” Anusuya Pal, Amalesh Gope and Germano S. Iannacchione have an important discussion about how statistical image data are monitored and analyzed. It is revealed that the image processing techniques can be used to understand and quantify the textural features that emerge during the drying process. The image processing methodology adopted in this chapter is certainly useful in quantifying the textural changes of the patterns at different saline concentrations those dictate the ubiquitous stages of the drying process.

In Chapter 9, “Introduction to Deep Learning in Health Informatics,” Monika Jyotiyana and Nishtha Kesswani discuss deep learning applications in biomedical data. Because of the vital role played by biomedical data, this is an emergent field in the health sector. These days, health industries focus on the correct and on-time treatment provided to the subject for their betterment while avoiding any negative aspects. The huge amount of data brings enormous opportunities as well as challenges. Deep learning and AI techniques provide a sustainable environment and enhancement over machine learning and other state-of-the-art theories.

In Chapter 10, “Data Mining Techniques and Algorithms in Psychiatric Health: A Systematic Review,” Shikha Gupta, Nitish Mehndiratta, Swarnim Sinha, Sangana Chaturvedi and Mehak Singla review the latest literature belonging to the intercessions for data mining in mental health covering many techniques and algorithms linked with data mining in the most prevalent diseases such as Alzheimer’s, dementia, depression, schizophrenia and bipolar disorder. Some of the academic databases used for this literature review are Google Scholar, IEEE Xplore and Research Gate, which include a handful of e-journals for study and research-based materials.

In Chapter 11, “Deep Learning Applications in Medical Image Analysis,” Ananya Singha, Rini Smita Thakur and Tushar Patel present detailed information about deep learning and its recent advancements in aiding medical image analysis. Also discussed are the variations that have evolved across different techniques of deep learning according to challenges in specific fields; and emphasizes one such extensively used tool, convolution neural network (CNN), in medical image analysis.

In Chapter 12, “Role of Medical Image Analysis in Oncology,” Gaganpreet Kaur, Hardik Garg, Kumari Heena, Lakhvir Singh, Navroz Kaur, Shubham Kumar and Shadab Alam give deep insight into the cancer studies used traditionally and the use of modern practices in medical image analysis used for them. Cancer is a disease caused due to uncontrolled division of cells other than normal body cells in any part of the body. It is among one of the most dreadful diseases affecting the whole world; moreover, the number of people suffering from this fatal disease is increasing day by day.

In Chapter 13, “A Comparative Analysis of Classifiers Using Particle Swarm Optimization-Based Feature Selection,” Chandra Sekhar Biswal, Subhendu Kumar Pani and Sujata Dash analyze the performance of classifiers using particle swarm optimization-based feature selection. Medical science researchers can collect several patients’ data and build an effective model by feature selection methods for better prediction of disease cure rate. In other words, the data acts just as an input into some kind of competitive decision-making mechanism that might place the company ahead of its rivals.

Concluding Remarks

The chapters of this book were written by eminent professors, researchers and those involved in the industry from different countries. The chapters were initially peer reviewed by the editorial board members, reviewers, and those in the industry, who themselves span many countries. The chapters are arranged to all have the basic introductory topics and advancements as well as future research directions, which enable budding researchers and engineers to pursue their work in this area.

Biomedical data mining for information retrieval is so diversified that it cannot be covered in a single book. However, with the encouraging research contributed by the researchers in this book, we (contributors), editorial board members, and reviewers tried to sum up the latest research domains, developments in the data analytics field, and applicable areas. First and foremost, we express our heartfelt appreciation to all the authors. We thank them all for considering and trusting this edited book as the platform for publishing their valuable work. We also thank all the authors for their kind co-operation extended during the various stages of processing of the manuscript. This edited book will serve as a motivating factor for those researchers who have spent years working as crime analysts, data analysts, statisticians, and budding researchers.

Dr. Sujata DashDepartment of Computer Science and ApplicationNorth Orissa University, Baripada, Mayurbhanj, India

Dr. Subhendu Kumar PaniPrincipalKrupajal Computer Academy, BPUT, Odisha, India

Dr. S. BalamuruganDirector of Research and Development Intelligent ResearchConsultancy Service (iRCS), Coimbatore, Tamil Nadu, India

Dr. Ajith AbrahamDirectorMIR Labs, USAMay 2021

1Mortality Prediction of ICU Patients Using Machine Learning Techniques

Babita Majhi1*, Aarti Kashyap1 and Ritanjali Majhi2

1Dept. of CSIT, Guru Ghasidas Vishwavidyalaya, Central University, Bilaspur, India

2School of Management, National Institute of Technology Karnataka, Surathkal, India

Abstract

The intensive care unit (ICU) admits highly ill patients to facilitate them serious attention and treatment using ventilators and other sophisticated medical equipments. These equipments are very costly hence its optimized uses are necessary. ICUs have a number of staffs in comparison to the number of patients admitted for regular monitoring of the patients. In brief, ICUs involve large amount of budget in comparison to other sections of any hospital. Therefore to help the doctors to find out which patient is more at risk mortality prediction is an important area of research. In data mining mortality prediction is a binary classification problem i.e. die or survive. As a result it attracts the machine learning group to apply the algorithms to do the mortality prediction. In this chapter six different machine learning methods such as Functional Link Artificial Neural Network (FLANN), Support Vector Machine (SVM), Discriminate Analysis (DA), Decision Tree (DT), Naïve Bayesian Network and K-Nearest Neighbors (KNN) are used to develop model for mortality prediction collecting data from Physionet Challenge 2012 and did the performance analysis of them. There are three separate data set each with 4000 records in Physionet Challenge 2012. This chapter uses dataset A containing 4000 records of different patients. The simulation study reveals that the decision tree based model outperforms the rest five models with an accuracy of 97.95% during testing. It is followed by the FA-FLANN model in the second rank with an accuracy of 87.60%.

Keywords: Mortality prediction, ICU patients, physioNet 2012 data, machine learning techniques

1.1 Introduction

Healthcare is the support or improvement of wellbeing by means of the avoidance, finding, treatment, recuperation or fix of sickness, disease, damage and other physical and mental hindrances in individuals [1]. Hospitals are dependent upon various weights, including restricted assets and human services assets which include limited funds and healthcare resources. Mortality prediction for ICU patients is basic commonly as the snappier and increasingly precise the choices taken by intensivists, the more the advantage for the two, patients and medicinal services assets. An emergency unit is for patients with the most genuine sicknesses or wounds. The vast majority of the patients need support from gear like the clinical ventilator to keep up typical body capacities and should be continually and firmly checked. For quite a long time, the number of ICUs has encountered an overall increment [2]. During the ICU remain, diverse physiological parameters are estimated and examined every day. Those parameters are utilized in scoring frameworks to measure the seriousness of the patients. ICUs are answerable for an expanding level of the human services spending plan, and consequently are a significant objective in the exertion to constrain social insurance costs [3]. Consequently, there is an expanding need, given the asset accessibility restrictions, to ensure that extra concentrated consideration assets are distributed to the individuals who are probably going to profit most from them. Basic choices incorporate hindering life-bolster medications and giving doesn’t revive orders when serious consideration is viewed as worthless. In this setting, mortality evaluation is an essential assignment, being utilized to foresee the last clinical result as well as to assess ICU viability, and assign assets.

In the course of recent decades, a few seriousness scoring frameworks and machine learning mortality prediction models have been developed [4]. Different traditional scoring techniques such as Acute Physiology and Chronic Health Evaluation (APACHE) [4], Simplified Acute Physiology Score (SAPS) [4], Sequential Organ Failure Assessment (SOFA) [4] and Mortality Probability Model (MPM) [4] and data mining techniques like Artificial Neural Network (ANN) [5], Support Vector Machine (SVM) [5], Decision Tree (DT) [5], Logistic Regression (LR) [5] have been used in the previous researches. Mortality prediction is still an open challenge in an Intensive Care Unit.

The objective of this chapter is to develop a model to predict whether a patient will survive in hospital or not in an ICU using different models such as Discriminate Analysis (DA), Decision Tree (DT), K-Nearest Neighbor (KNN), Naive Bayesian, Support Vector Machine (SVM) and Functional Link Artificial Neural Network (FLANN), a low complexity neural network and its comparison. The dataset have been collected from the PhysioNet Challenge 2012 [6] which consists of 4,000 records of patients admitted in ICU. There are 41 variables during first 48 h after the admission of patients to the ICU from which 5 variables indicate general descriptors—age, gender, height, ICU type and initial weight, 36 variables (time series) from which 15 variables (Temp, HR, Urine, pH, RespRate, GCS, FiO2, PaCO2, MAP, SysABP, DiasABP, NIMAP, NiDiasABP, MechVent, NISysABP) will be taken as input and 5 outcome descriptors—SAPS-1 score, SOFA score, length of stay in days (LOS), length of survival and in-hospital death (0 for survival and 1 for death in hospital) to predict the survival of patients.

The rest of the chapter is organized as follows: Section 1.2 describes the previous studies of mortality prediction, Material and methods are presented in Section 1.3 where data collection, data-preprocessing, model description is properly described. Section 1.4 presents the obtained results. Section 1.5 briefly discusses the work with conclusion and finally Section 1.6 gives the future work.

1.2 Review of Literature

Many researchers applied different models in PhysioNet Challenge 2012 dataset and obtained different accuracy results.

Silva et al. [7] have developed a method for the prediction of mortality in an in-hospital death (0 takes as survivor and 1 taken as died in hospital). They have collected the data from PhysioNet website and perform the challenges. Dataset consists of three sets: sets A, B and C. Each set has 4,000 records. The challenges are given in two events: event I for a binary classifier measurement performance and event II for a risk estimator measurement performance. For event I scoring criteria are evaluated by using sensitivity and positive predictive value and for event II Hosmer–Lemeshow statistic [8] is used. A baseline algorithm (SAPS-I) is used and obtained score of 0.3125 and 68.58 for events I and II respectively and final score they obtained for events I and II are 0.5353 and 17.58. In Ref. [9] Johnson et al. have described a novel Bayesian ensemble algorithm for mortality prediction. Artifacts and erroneous recordings are removed using data pre-processing. The model is trained using 4,000 records from training set for set A and also with two datasets B and C. Jack-knifing method is performed to estimate the performance of the model. The model has obtained values of 0.5310 and 0.5353 as score 1 on the hidden datasets. Hosmer– Lemeshow statistic has given 26.44 and 29.86 as score 2. The model has re-developed and obtained 0.5374 and 18.20 for scores 1 and 2 on dataset C. The overall performance of the proposed model gives better performance than traditional SAPS model which have some advantages such as missing data handling etc. An improved version of model to estimate the in hospital mortality in the ICU using 37 time series variables is presented in Ref. [10]. They have estimated the performance of various models by using 10-fold cross validation. In the clinical data, it is common to have missing values. These missing values are imputed by using the mean value for patient’s age and gender. A logistic regression model is used and trained using the dataset. The performance of model is evaluated by the two events: Event 1 for the accuracy using low sensitivity and positive predictive value and Event 2 for the Hosmer–Lemeshow H static model for calibration. Their model has resulted 0.516 and 14.4 scores for events 1 and 2 for test set B and 0.482 and 51.7 scores for both the event for test set C. The model performance is better than the existing SAPS model. Another model in Ref. [11] has developed an algorithm to predict the in-hospital death of ICU patients for the event 1 and probability estimation in event 2. Here the missing values are imputed by zero and the data is normalized. Six support vector machine (SVM) classifiers are used for training. For each SVM positive examples and one sixth of the negative examples have taken in the training set. The obtained scores for events 1 and 2 are 0.5345 and 17.88 respectively. An artificial neural network model has developed for the prediction of in-hospital death patients in the ICU under the 48 h observations from the admission [12]. Missing values are handled using an artificial value based on assumption. From all feature sets, 26 features are selected for further process. For classification, two layered neural network having 15 neurons in the hidden layers is used. The model has used 100 voting classifiers and the output it produced is the average of 100 outputs. The mode is trained and tested using 5-fold cross validation. Fuzzy threshold is used to determine the output of the neural network. The model is resulted 0.5088 score for event 1 and 82.211 score for event 2 on the test data set. Ref. [13] has presented an approach that identify time series motifs to predict ICU patients in an in-hospital segmenting the variables into low, high and medium measurements. The method has outperformed the existing scoring systems, SAPS-II, APACHE-II and SOFA and obtained 0.46 score for event 1 and 56.45 score for event 2. An improved mortality prediction using logistic regression and Hidden Markov model has developed for an in-hospital death in Ref. [14]. The model is trained using 4,000 records of patients on set A and validation on other sets of unseen data of 4,000 records. Two different events: event 1 for minimum sensitivity and positive predictive value and for event 2 Hosmer–Lemeshow H statistic is used. The model has given 0.50, 0.50 for event 1 and 15.18, 78.9 for event 2 compared to SAPS-I whose event 1 scores are 0.3170, 0.312 and for event 2 66.03 and 68.58 respectively. An effective framework model for predicting in- hospital death mortality in the ICU stay has been suggested in Ref. [15]. Feature extraction is done by data interpolation and Histogram analysis. To reduce the complexity of feature extraction, it reduces the feature vector by evaluating measurement value of each variable. Then finally Cascaded Adaboost learning model is applied as mortality classifier and obtained the 0.806 score for event 1 and 24.00 score for event 2 on dataset A. On another dataset B the model has obtained 0.379 and 5331.15 score for both events 1 and 2. A decision support application for mortality prediction risk has been reported in Ref. [16]. For the clinical rules the authors have used fuzzy rule based systems. An optimizer is used with genetic algorithm which generates final solutions coefficients. The model FIS achieves 0.39 score for event 1 and 94 score for event 2. To predict the mortality in an ICU, a new method is proposed in Ref. [17]. The method, Simple Correspondence Analysis (SCA) is based on both clinical and laboratory data with the two previous models APACHE-II and SAPS-II. It collects the data from PhysioNet Challenge 2012 of total 12,000 records of Sets A, B and C and 37 time series variables are recorded. SCA method is applied to select variables. SCA combines these variables using traditional methods APACHE and SAPS. This method predicts whether the patient will survive or not. Finally, model has obtained 43.50% score 1 for set A, 42.25% score 1 for set B and 42.73% score1 for set C. The Naive Bayesian Classifier is used in [18] to predict mortality in an ICU and obtain high and small S1 and S2. For S1 sensitivity and predictive positive and for S2 Hosmer–Lemeshow H statistic is defined. It replaces the missing values by NaN (Not-a-Number) if variable is not measured. The model achieves 0.475 for S1 which is the eighth best solution and 12.820 for S2 which is the first best solution on set B. On set C, model has achieved 0.4928 score for event 1 (forth best solution) and 0.247 score for event 2 (third best solution). Di Marco et al. [19] have proposed a new algorithm for mortality prediction with better accuracy for data collected from the first 48 h of admission in ICU. A binary classifier model is applied to obtain result for event 1. The set A is selected which contains 41 variables of 4,000 patients. For feature selection forward sequential with logistic cost function is used. For classification a logistic regression model is used which obtained 54.9% score on set A and 44.0% on test set B. To predict mortality rate Ref. [20] has developed a model based on Support Vector Machine. Support Vector Machine is the machine learning algorithm which tries to minimize error and find the best hyperplane of maximum margin. The two classes represent 0 as survivor or 1 as died in-hospital. For training they read 3,000 data and for testing 1,000 data. They observed an over-fitting of SVM on set A and obtained 0.8158 score for event 1 and 0.3045 score for event 2. For phase 2 they set to improve the training strategies of SVM. They reduce the over-fitting of SVM. The final obtained for event 1 is 0.530 and for set B is 0.350 and for set C final score is 0.333. An algorithm based on artificial neural network has employed to predict patient’s mortality in the hospital in Ref. [21]. Features are extracted from the PhysioNet data and a method is used to detect solar ‘nanoflares’ due to the similarity between solar and time series data. Data preprocessing is done to remove outliers. Missing values are replaced by the mean value of each patient. Then the model is trained and yields 22.83 score for event 2 on set B and 38.23 score on set C. A logistic regression model is suggested in Ref. [22] for the purpose. It follows three phases. In phase 1 selection of derived variables on set A, calculation of the variable’s first value, average, minimum value, maximum value, total time, first difference and last value is done. Phase 2 has applied logistic regression model to predict patients in-hospital death (0 for survivor, 1 for died) on the set A. Third phase applies logistic regression model to obtain events 1 and 2 score. The results obtained are 0.4116 for score1 and 8.843 for score2. The paper [23] also reported a logistic regression model for the prediction of mortality. The experiment is done using 4,000 ICU patients for training in set A and 4,000 patients for testing in set B. During the filtering process it figures out 30 variables for building up model. Results obtained are score 0.451 for event 1 and score 2 45.010 for event 2. A novel cluster analysis technique is used in Ref. [24] to test the similarities between time series data for mortality prediction. For data preprocessing it uses a segmentation based approach to divide variables in several segments. The maximal and minimal values are used to maintain its statistical features. Weighted Euclidian distance based clustering and rule based classification is used. The average result obtained for death prediction is 22.77 to 33.08% and for live prediction is 75 to 86%.

In Ref. [25], the main goal is to improve the mortality prediction of the ICU patients by using the PhysioNet Challenge 2012 dataset. Mainly three objectives have accomplished (i) reduction of dimensions, (ii) reduction of uncontrolled variance and (iii) less dependency on training set. Feature reduction techniques such as Principal Component Analysis, Spectral Clustering, Factor Analysis and Tukey’s HSD Test are used. Classification is done using SVM that has achieved better accuracy result of 0.73 than the previous work. The authors in Ref. [26] have extracted 61,533 data from the MIMIC-III v1.4, excluded patients whose age is less than 16, patients who stay less than 4 h and patients whose data is not present in the flow sheet. Finally 50,488 cohort ICU stays are used for experiments. Features are extracted by using window of fixed length. The machine learning models used are Logistic Regression, LR with L1 regularization penalty using Least Absolute Shrinkage and Selection Operator (LASSO), LR with L2 regularization penalty and Gradient Boosting Decision Trees. Severity of illness is calculated using different scores such as APS III, SOFA, SAPS, LODS, SAPS II and OASIS. Two types of experiments are conducted i.e. Benchmarking experiment and Real-time experiment. Models are compared from which Gradient Boosting Algorithm obtained high AUROC of 0.920. Prediction of hospital mortality through time series analysis of an intensive care unit patient in an early stage, during the admission by using different data mining techniques is carried in [27]. Different traditional scoring system such as APACHE, SAPS and SOFA are used to obtain score. 4,000 ICU patients are selected from MIMIC database and 37 time series variables are selected from first 48 h of admission. Synthetic Minority Oversampling Technique (SMOTE) (original and smote) is used to modify datasets where they handle missing data by replacing with mean (rep1), then SMOTE (rep1 and smote) is applied. After replacing missing data, EM-Imputation (rep2) algorithm is applied. Finally, result is obtained by using different classifiers like Random Forest (RF), Partial Decision Tree (PART) and Bayesian Network (BN). Among all these three classifiers, Random Forest has obtained best result with AUROC of 0.83 ± 0.03 at 48 h on the rep1, with AUROC of 0.82 ± 0.03 on original, rep1 and smote at 40 h and with AUROC of 0.82 ± 0.03 on rep2 and smote at 48 h.

Sepsis is one of the reasons for high mortality rate and it should be recover quickly, because due to sepsis [28] there is a chance of increasing risk of death after discharge from hospital. The objective of the paper is to develop a model for one year mortality prediction. 5,650 admitted patients with sepsis were selected from MIMIC-III database and were divided into 70% patients for training and 30% patients for testing. Stochastic Gradient Boosting Method is used to develop one-year mortality prediction model. Variables are selected by using Least Absolute Shrinkage and Selection Operator (LASSO) and AUROC is calculated. 0.8039 with confidence level 95%: [0.8033–0.8045] of AUROC result is obtained in testing set. Finally, it is observed that Stochastic Gradient Boosting assembly algorithm is more accurate for one year mortality prediction than other traditional scoring systems—SAPS, OASIS, MPM or SOFA.

Deep learning is successfully applied in various large and complex data-sets. It is one of the new technique which is outperformed the traditional techniques. A multi-scale deep convolution neural network (ConvNets) model for mortality prediction is proposed in Ref. [29]. The dataset is taken from MIMIC-III database and 22 different variables are extracted for measurements from first 48 h for each patient. ConvNet is a multilayer neural network and discrete convolution operation is applied in the network. Convolution Neural Network models have been developed as a backend using different python packages i.e. Keras and TensorFlow. The result obtained by the proposed model gives better result of ROC AUC (0.8735, ± 0.0025) which satisfies the state of art of deep learning models.

1.3 Materials and Methods

1.3.1 Dataset

The dataset is collected from PhysioNet Challenge 2012 which consists of three sets A, B and C [6]. A total of 12,000 patient records are available. Each set consists of 4,000 records of patients from which only set A dataset of 4,000 records are used in this chapter for simulation. There are 41 variables recorded in dataset, five of these variables (age, gender, height, ICU type and initial weight) are general descriptors and 36 variables are times series variables as described in Table 1.1.

From the above 36 variables, only 15 variables are selected for mortality prediction. These variables are represented below in Table 1.2.

From these 15 variables, first value, last value, highest value, lowest value and median value are calculated for nine variables and taken as features. Only first and last values are taken for four variables. For the dataset A, five outcome-related descriptors (SAPS Score, SOFA Score, length of stay, length of survival and in-hospital death) are available from which inhospital death (0 is represented as a survivor and 1 is represented as died in hospital) is taken as a target value.

1.3.2 Data Pre-Processing

Data pre-processing is one of the technique to filter and remove noisy data. 41 variables are given in the dataset. Among them 15 variables are selected out of which some of the variables are not carefully collected and having missing values. In this chapter, missing data are replaced by zeros.

1.3.3 Normalization

All the variables in the dataset are in different ranges and in different scales. The current values of data cannot be used for classification. If all the variables have the values in better ranges and scales, classifiers will work in a better way. A standard approach, z-score normalization method is used to normalize the variables.

Table 1.1 Time series variables with description and physical units recorded in the ICU [6].

S. no.

Variables

Description

Physical units

1.

Albumin

Albumin

g/dL

2.

ALP

Alkaline Phosphate

IU/L

3.

ALT

Alanine transaminase

IU/L

4.

AST

Aspartate transaminase

IU/L

5.

Bilirubin

Bilirubin

mg/dL

6.

BUN

Blood urea nitrogen

mg/dL

7.

Cholesterol

Cholesterol

mg/dL

8.

Creatinine

Creatinine

mg/dL

9.

DiasABP

Invasive diastolic arterial blood pressure

mmHg

10.

FiO2

Fractional inspired oxygen

[0–1]

11.

GCS

Glasgow Coma Score

[3–15]

12.

Glucose

Serum Glucose

mg/dL

13.

HCO3

Serum Bicarbonate

mmol/L

14.

HCT

Hematocrit

%

15.

HR

Heart Rate

bpm

16.

K

Serum Potassium

mEq/L

17.

Lactate

Lactate

mmol/L

18.

Mg

Serum Magnesium

mmol/L

19.

MAP

Invasive mean arterial blood pressure

mmHg

20.

MechVent

Mechanical Respiration Ventilation

0/1(true/false)

21.

Na

Serum Sodium

mEq/L

22.

NIDiasABP

Non-invasive diastolic arterial blood pressure

mmHg

23.

NIMAP

Non-invasive mean arterial blood pressure

mmHg

24.

NISysABP

Non-invasive systolic arterial blood pressure

mmHg

25.

PaCO2

Partial pressure of arterial carbon dioxide

mmHg

26.

PaO2

Partial pressure of arterial oxygen

mmHg

27.

pH

Arterial pH

[0–14]

28.

Platelets

Platelets

cells/nL

29.

RespRate

Respiration Rate

bpm

30.

SaO2

O2 saturation in hemoglobin

%

31.

SysABP

Invasive systolic arterial blood pressure

mmHg

32.

Temp

Temperature

°C

33.

TropI

Troponin-I

µg/L

34.

TropT

Troponin-T

µg/L

35.

Urine

Urine Output

mL

36.

WBC

White Blood Cells Count

cells/nL

1.3.4 Mortality Prediction

After data pre-processing, normalization, feature extraction and feature reduction, different models are employed to predict the patient’s mortality in an in-hospital stage and calculate the accuracy. The models predict either patient will survive or die. This is determined by using classification technique as mortality prediction is a binary classification problem. This process is done step by step as shown in Figure 1.1.

Table 1.2 Time series variables with physical units [30].

S. no.

Variables

Physical units

1.

Temperature

Celsius

2.

Heart Rate

bpm

3.

Urine Output

mL

4.

pH

[0–14]

5.

Respiration Rate

bpm

6.

GCS (Glassgow Coma Index)

[3–15]

7.

FiO2 (Fractional Inspired Oxygen)

[0–1]

8.

PaCo2 (Partial Pressure Carbon dioxide)

mmHg

9.

MAP (Invasive Mean arterial blood pressure)

mmHg

10.

SysABP (Invasive Systolic arterial blood pressure)

mmHg

11.

DiasABP (Invasive Diastolic arterial blood pressure)

mmHg

12.

NIMAP (Non-invasive mean arterial blood pressure)

mmHg

13.

NIDiasABP (Non-invasive diastolic arterial blood pressure)

mmHg

14.

Mechanical ventilation respiration

[yes/no]

15.

NISysABP (Non-invasive systolic arterial blood pressure)

mmHg

1.3.5 Model Description and Development

Different models are developed in this chapter to estimate the performance of mortality prediction and comparison between them is also made. The models such as FLANN, Discriminant analysis, Decision Tree, KNN, Naive Bayesian and Support Vector Machine are applied to develop different classifiers. Out of 4,000 records of dataset A 3,000 records are taken as training set and remaining 1,000 records are used for validation or test of the models.

First of all Factor Analysis (FA) is applied to the selected variables to reduce the features. Factor analysis is one of the feature reduction techniques which is used to reduce the high dimension features to low dimension [31]. The 58 features of the dataset are reduced to 49 using FA. Several steps to of factor analysis are

Figure 1.1 Step by step process for mortality prediction.

First normalize the data matrix (Y) by using z-score method.

Calculate the auto correlation of the matrix (R).

(1.1)

Calculate the Eigen vectors (U) and Eigen values (l)

(1.2)

Rearrange the Eigen vector and Eigen values in descending order

Calculate the factor loading matrix (A) by using

(1.3)

Calculate the score matrix (B)

(1.4)

Calculate the factor score (F)

(1.5)

After reducing the features, FLANN model [32] is used to predict patient’s survival or in-hospital death and finally evaluate the overall performance. The FLANN based mortality prediction model is shown in Figure 1.2. To design FLANN model 4,000 records of patients (dataset A) is selected. Out of 4,000 data, 3,000 data are selected for training and remaining 1,000 data are used for testing the model. During the training process each record with 49 features out of 3,000 records is taken as input. Each of the features is then expanded trigonometrically to five terms and map the data to a nonlinear format. The outputs of the functional expansion is multiplied with the corresponding weight valued and summed together to generate an output which is known as actual output. The actual output is then compared with the desired output either 0.1 (for 0) or 0.9 (for 1). If there are any differences in the actual and desired output, an error signal will be generated. On the basis of this error signal, weights and biases are updated using Least Mean Square (LMS) [33] algorithm. The process is repeated until all training patterns are used. The experiment is continued for 3,000 iterations. The value of learning parameter is 0.1. The mean square error (MSE) value for each iteration is stored and plotted to show the convergence characteristics as given in Figure 1.3. Once the training is over and the model is ready for prediction, 1,000 records which are kept aside for testing purpose in given to the model with fixed value of weights and bias obtained after the end of training process. For each input pattern the output or class label is calculated and compared with the target class label.

Figure 1.2 The FLANN based mortality prediction model.

Similarly, other models Discriminant Analysis (DA), Decision Tree (DT), K-Nearest Neighborhood (KNN), Naive Bayesian and Support Vector Machine (SVM) are also applied to predict mortality in an in-hospital death and obtained results using their own principles as briefed below.

Discriminant analysis [34] is one of the statistical tools which is used to classify individuals into a number of groups. To separate two groups, Discriminant Function Analysis (DFA) is used and to separate more than two groups Canonical Varieties Analysis (CVA) is used. There are two potential goals in a discriminant investigation: finding a prescient condition for grouping new people or deciphering the prescient condition to all the more likely comprehend the connections that may exist among the factors.