E-Book
187,99 €

Wellness Management Powered by AI Technologies E-Book

0,0

187,99 €

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.

Herausgeber: John Wiley & Sons
Kategorie: Wissenschaft und neue Technologien
Serie: Machine Learning in Biomedical Science and Healthcare Informatics
Sprache: Englisch

Beschreibung

This book is an essential resource on the impact of AI in medical systems, helping readers stay ahead in the modern era with cutting-edge solutions, knowledge, and real-world case studies.

Wellness Management Powered by AI Technologies explores the intricate ways machine learning and the Internet of Things (IoT) have been woven into the fabric of healthcare solutions. From smart wearable devices tracking vital signs in real time to ML-driven diagnostic tools providing accurate predictions, readers will gain insights into how these technologies continually reshape healthcare.

The book begins by examining the fundamental principles of machine learning and IoT, providing readers with a solid understanding of the underlying concepts. Through clear and concise explanations, readers will grasp the complexities of the algorithms that power predictive analytics, disease detection, and personalized treatment recommendations. In parallel, they will uncover the role of IoT devices in collecting data that fuels these intelligent systems, bridging the gap between patients and practitioners.

In the following chapters, readers will delve into real-world case studies and success stories that illustrate the tangible benefits of this dynamic duo. This book is not merely a technical exposition; it serves as a roadmap for healthcare professionals and anyone invested in the future of healthcare.

Readers will find the book:

Explores how AI is transforming diagnostics, treatments, and healthcare delivery, offering cutting-edge solutions for modern healthcare challenges;
Provides practical knowledge on implementing AI in healthcare settings, enhancing efficiency and patient outcomes;
Offers authoritative insights into current AI trends and future developments in healthcare;
Features real-world case studies and examples showcasing successful AI integrations in various medical fields.

Audience
This book is a valuable resource for researchers, industry professionals, and engineers from diverse fields such as computer science, artificial intelligence, electronics and electrical engineering, healthcare management, and policymakers.

Details

Sie lesen das E-Book in den Legimi-Apps auf:

Android

iOS

von Legimi
zertifizierten E-Readern

Seitenzahl: 601

Veröffentlichungsjahr: 2024

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Ähnliche

BESTSELLER

Desire – Die Zeit der Rache ist gekommen

Lisa Jackson

BESTSELLER

Wolkenschloss (Ungekürzte Lesung)

Kerstin Gier

BESTSELLER

The Deadly Side of Love

Francis Eden

BESTSELLER

A Dark and Secret Magic (Ungekürzte Lesung)

Wallis Kinney

BESTSELLER

Not Quite Dead Yet (Ungekürzt)

Holly Jackson

BESTSELLER

Versprich mir, dass du tanzt (Ungekürzte Lesung)

Dani Atkins

BESTSELLER

Die Verlorene (Autorisierte Lesefassung)

Miriam Georg

BESTSELLER

Lost Girls - Breathing for the First Time - Lost-Girls-Dilogie, Band 1 (Ungekürzte Lesung)

Nikola Hotel

BESTSELLER

Davyan (Band 1): Der Aschenprinz

C.M. Spoerri

BESTSELLER

Der Laden in der Mondlichtgasse (Ungekürzte Lesung)

Hiyoko Kurisu

BESTSELLER

Nightblood Prince - Nightblood Prince, Band 1 (Ungekürzte Lesung)

Firebird - Flammensturm, Teil 1 (Ungekürzt)

Juliette Cross

BESTSELLER

Der Sheriff und die Fremde

Gentle Heart - Scarlet Luck, Teil 3 (Ungekürzt)

Until I Get You - Fairview Hockey, Teil 1 (Ungekürzt)

Cover

Table of Contents

Series Page

Title Page

Preface

1 Exploring Functional Modules Using Co-Clustering of Protein Interaction Networks

1.1 Introduction

1.2 Related Works

1.3 Basic Terminologies

1.4 Existing Methods

1.5 About Dataset

1.6 Experimental Environment

1.7 Validation Measures

1.8 Biological Significances

1.9 Proposed Co-Clustering Approach: MR-CoC

1.10 Functional Module Mining Using MR-CoC

1.11 Conclusion

Appendix

References

2 Natural Language Processing in Healthcare: Enhancing Wellbeing through a COVID-19 Case Study

2.1 Introduction

2.2 NLP Approaches

2.3 NLP Pipeline for Smart Healthcare

2.4 Applications of NLP in Healthcare

2.5 COVID Detection Using NLP

2.6 Results and Discussion

2.7 Conclusion

References

3 Artificial Intelligence Assisted Internet of Medical Things (AIoMTs) in Sustainable Healthcare Ecosystem

3.1 Introduction

3.2 Medical Wearable Electronics

3.3 Electronic Signals in Sensors

3.4 Electronic Devices Challenges in the AIoMT

3.5 AIoMT Benefits

3.6 AIoMTs Challenges

3.7 AIoMT Limitations

3.8 Future Research Direction

3.9 Conclusions and Future Scope

References

4 An Online Platform for Timely Access to Medical Care with the Help of Real-Time Data Analysis

4.1 Introduction

4.2 What Happened

4.3 Literature Review

4.4 Methodology

4.5 Hardware Component

4.6 Conclusion

4.7 Future Work

References

5 A Comprehensive Review of Cardiac Image Analysis for Precise Heart Disease Diagnosis Using Deep Learning Techniques

5.1 Introduction and Major Contribution

5.2 Literature Review

5.3 Machine Learning Methods

5.4 Proposed System

5.5 Mathematical Model

5.6 Data Preparation

5.7 Model Training and Evaluation

5.8 Results and Discussion

5.9 Conclusion and Future Work

References

6 A Hybrid Machine Learning Model for an Efficient Detection of Liver Inflammation

Abbreviations

6.1 Introduction

6.2 Machine Learning for Liver Disease Prediction

6.3 Related Works

6.4 Experimental Analysis

6.5 Result Evaluation

6.6 Conclusion

6.7 Enhancement of PCA Over Other Dimensionality Reductions

References

7 Advancements in Parkinson’s Disease Diagnosis through Automated Speech Analysis

7.1 Introduction

7.2 Speech Characteristics in Parkinson’s Disease

7.3 Technological Advances in Speech Analysis

7.4 Integration of Multimodal Data

7.5 Related Works

7.6 Building a Machine Learning (ML) Model

7.7 Experimental Analysis and Performance Measures

7.8 Future Directions

7.9 Challenges and Limitations

7.10 Conclusion and Implications

References

8 Public Opinion Segmentation on COVID-19 Vaccination and Its Impact on Wellbeing

8.1 Introduction

8.2 Background and Related Work

8.3 Machine Learning Techniques

8.4 Ensemble Machine Learning Algorithms

8.5 Methodology

8.6 Results and Discussion

8.7 Impact on Wellbeing

8.8 Conclusion

References

9 Revolutionizing Healthcare with IoT in Cardiology

9.1 Introduction

9.2 Background

9.3 Motivation

9.4 Primary Diseases Globally

9.5 IoT Revolutionizes Healthcare

9.6 IoT Patient Monitoring Devices and Early Detection of Heart-Related Problems

9.7 An IoT-Based Heart Disease Monitoring System

9.8 Conclusions

References

10 Human Biological Analysis Through Fitness Watch Using Deep Learning Algorithm

10.1 Introduction

10.2 Literature Survey

10.3 Methodology

10.4 Results and Discussion

10.5 Limitation of the Work

10.6 Validation and Comparative Analysis

10.7 Conclusion

References

11 Decoding Kidney Health: Effectiveness of Machine Learning Techniques in Diagnosis of Chronic Kidney Disease

11.1 Introduction

11.2 Methods

11.3 Methodology

11.4 Results and Discussion

11.5 Conclusion

References

12 Integrating Metaheuristics and Machine Learning for Wellbeing Management: Case of COVID-19

12.1 Introduction

12.2 Related Work

12.3 Background Knowledge

12.4 Methodology

12.5 Results and Discussions

12.6 Conclusion

References

13 Fusing Sentiment Analysis with Hybrid Collaborative Algorithms for Enhanced Recommender Systems

13.1 Introduction

13.2 Literature Survey

13.3 Comparative Result Study

13.4 Conclusion and Future Scope

References

14 The Future of Well-Being: AI-Powered Health Management with Privacy at its Core

14.1 Introduction

14.2 Related Works

14.3 Proposed Work

14.4 Performance Evaluation

14.5 Conclusion and Future Work

References

15 Artificial Pancreas: Enhancing Glucose Control and Overall Well-Being

15.1 Introduction

15.2 Closed-Loop Diabetes Control System

15.3 Testing and Regulatory Approvals

15.4 Safety Requirements in the Design of Artificial Pancreas

References

Index

End User License Agreement

List of Tables

Chapter 1

Table 1.1 Literature study for functional module mining.

Table 1.2 A review on existing approaches for binary co-clustering.

Table 1.3 SCoC

nsym

: Toy example.

Table 1.4 Synthetic dataset description—SCoC

nsym

Table 1.5 Comparative analysis of SCoC

nsym

based on match score measure.

Table 1.6 Comparative analysis of SCoC

nsym

based on computational time.

Table 1.7 Types of seeding tested on synthetic Dataset_rand_I in SCoC

rand

appr...

Table 1.8 Comparative analysis of SCoC

rand

based on match score measure of co-...

Table 1.9 Comparative analysis of SCoC

rand

based on computational time (in sec...

Table 1.10 Key-value pairs of MR-CoC.

Table 1.11 Synthetic datasets for the MR-CoC approach.

Table 1.12 Comparative analysis of MR-CoC based on match score measure.

Table 1.13 Comparative analysis of MR-CoC based on computational time.

Table 1.14 Proposed methods vs. focused issues.

Table 1.15 MR-CoC: biological functionalities of few UM protein modules.

Table 1.16 Drug target-based biological functionalities of UM functional modul...

Table 1.17 List of abbreviations.

Chapter 2

Table 2.1 Features in the collected dataset.

Table 2.2 Classification report of machine learning algorithms using the propo...

Table 2.3 Classification report of ensemble machine learning algorithms using ...

Chapter 4

Table 4.1 Survey on previously implemented techniques.

Chapter 5

Table 5.1 Accuracy performance.

Table 5.2 Error performance employing different algorithms.

Chapter 6

Table 6.1 Different possible datasets of liver patients.

Table 6.2 Correlation between PCA and variance.

Table 6.3 Evaluation results of liver dataset.

Chapter 7

Table 7.1 Recent works in Parkinson’s disease (PD) detection.

Table 7.2 Experimental performance analysis of the classifiers.

Table 7.3 Hyperparameters tuning for different kernels.

Table 7.4 Comparative study of the recent studies of PD detection models.

Chapter 8

Table 8.1 Hyperparameter tuning results.

Table 8.2 Classification report of various machine and ensemble learning algor...

Chapter 9

Table 9.1 Primary diseases globally.

Table 9.2 Average heart rate [72].

Chapter 10

Table 10.1 Gender-based health analytics.

Table 10.2 Age-based health analytics.

Chapter 11

Table 11.1 Confusion matrix.

Table 11.2 Accuracies of kidney disease diagnosis models.

Table 11.3 Confusion matrix of kidney disease diagnosis models.

Table 11.4 Precision recall and F1 score of diagnosis models.

Chapter 12

Table 12.1 LSTM parameters’ configuration type.

Table 12.2 LSTM parameters’ values.

Table 12.3 Genetic algorithm parameters.

Table 12.4 Feature selection results for the total number of deaths in UAE.

Table 12.5 Genetic algorithm results for the total number of deaths in UAE.

Table 12.6 Cross-validation results for the total number of deaths in UAE.

Table 12.7 Benchmark model results for the total number of deaths in UAE.

Table 12.8 The GA-optimized LSTM results for the total number of deaths in Oma...

Table 12.9 Results of the GA-optimized LSTM.

Table 12.10 The GA-optimized LSTM results for the total number of cases in UAE...

Table 12.11 The GA-optimized LSTM results for the total number of cases in Bah...

Table 12.12 Different variants of LSTM results for the total number of cases i...

Chapter 13

Table 13.1 Summary of the reviewed literature.

Table 13.2 Comparison of the accuracy of various procedures.

Chapter 14

Table 14.1 Key contributions and limitations of existing privacy schemes.

Table 14.2 Performance metrics.

Table 14.3 Performance evaluation metrics comparison across systems.

List of Illustrations

Chapter 1

Figure 1.1 Workflow of the current research.

Figure 1.2 Central dogma of proteins [25].

Figure 1.3 Sample protein interaction network.

Figure 1.4 Sample protein modules.

Figure 1.5 Co-clustering: A toy example.

Figure 1.6 Sample illustration of BiMax algorithm.

Figure 1.7 Sample PIN dataset.

Figure 1.8 STRING database download page.

Figure 1.9 CORUM database statistics [29].

Figure 1.10 Protein complex statistics.

Figure 1.11 Workflow of MapReduce in MATLAB [30].

Figure 1.12 Sample input matrix and its heatmap.

Figure 1.13 Synthetic Dataset_nsym_I. (a) Noiseless; (b) noisy.

Figure 1.14 Synthetic Dataset_nsym_II. (a) Noiseless; (b) noisy.

Figure 1.15 Heatmap of synthetic Dataset_nsym_III. (a) Noiseless; (b) noisy.

Figure 1.16 Heatmap of synthetic Dataset_nsym_IV. (a) Noiseless; (b) noisy.

Figure 1.17 Synthetic Dataset_rand_I—implanted co-clusters at random portions ...

Figure 1.18 SCoC

rand

—row wise seeds of Dataset_rand_I.

Figure 1.19 SCoC

rand

—column wise seeds of Dataset_rand_I.

Figure 1.20 SCoCrand—random seeds of Dataset_rand_I.

Figure 1.21 Workflow of the MR-CoC approach.

Figure 1.22 Protein complex coverage with minimum protein module size 3.

Figure 1.23 Protein complex inclusion rate with module size 4 and above.

Figure 1.24 Protein complex inclusion with module size 5 and above.

Chapter 2

Figure 2.1 Pipeline for smart healthcare.

Figure 2.2 Proposed methodology.

Chapter 3

Figure 3.1 Essential emerging technologies and selected applications.

Figure 3.2 Enabling medical technologies applications.

Figure 3.3 AIoMTs and application.

Figure 3.4 Sampled IoMT application in medical attention.

Figure 3.5 AIoMT management processes.

Figure 3.6 Medical healthcare innovation.

Chapter 4

Figure 4.1 Ministry of road transport and highways.

Figure 4.2 Deaths amenable to healthcare.

Figure 4.3 Decision tree working.

Figure 4.4 K-means algorithm.

Figure 4.5 Conceptual framework.

Figure 4.6 Model visualization results.

Figure 4.7 Finding the shortest path.

Figure 4.8 RFID technology.

Chapter 5

Figure 5.1 Convolutional neural network (CNN) architecture.

Figure 5.2 Precision vs. sensitivity vs. specificity.

Figure 5.3 Accuracy of various algorithms.

Figure 5.4 MAE chart.

Figure 5.5 Kappa statistics chart.

Figure 5.6 Confusion matrix for ECG heartbeat.

Figure 5.7 Percentage of correctly classified by category.

Figure 5.8 Distribution of heartbeats classified correctly and incorrectly.

Figure 5.9 The training and validation accuracy and loss of a convolutional ne...

Chapter 6

Figure 6.1 Cumulative proteins vs albumin.

Figure 6.2 Integration of PCA with KNN for liver inflammation.

Figure 6.3 Random forest algorithm.

Figure 6.4 Confusion matrix before and after applying PCA.

Chapter 7

Figure 7.1 Human brain [27].

Figure 7.2 PD symptoms.

Figure 7.3 Sample screenshot of the dataset.

Figure 7.4 Model building process.

Figure 7.5 Correlation graph.

Figure 7.6 Accumulated explained variance.

Figure 7.7 Classification classes.

Figure 7.8 Comparison graph for the different classifiers under study.

Figure 7.9 Comparison for the different kernels with respect to the Parzen Win...

Chapter 8

Figure 8.1 Word cloud of positive tweets.

Figure 8.2 Word cloud of negative tweets.

Figure 8.3 Distribution of COVID-19 vaccination-based sentiments.

Chapter 9

Figure 9.1 The concept of IoT in healthcare [20].

Figure 9.2 Heartbeat sensor [52].

NOTE:

A group consisting of K. Butchi Raju,...

Figure 9.3 Smart heart disease prediction system incorporating IoT and fog com...

Figure 9.4 ECG sensor [53].

Figure 9.5 Blood pressure [54].

Figure 9.6 Heart rate monitor [55].

Figure 9.7 Pulse oximeter [56].

Figure 9.8 Temperature sensor [57].

Figure 9.9 Respiratory rate monitor [58].

Figure 9.10 Activity and movement sensors [59].

Figure 9.11 Sleep monitoring sensor [60].

Figure 9.12 Stress and anxiety monitor [61].

Figure 9.13 System flow chart.

Figure 9.14 Heart rate sensor connection with Arduino Uno [70].

Figure 9.15 Heart monitoring system using Blynk app [72].

Figure 9.16 (a): Arduino board [75], (b): original pictures of Arduino board c...

Figure 9.17 HC-05 Bluetooth [81].

Figure 9.18 Jumper wires [82].

Figure 9.19 Breadboard [83].

Figure 9.20 USB cable connecting PC with Arduino [84].

Figure 9.21 Result on the serial monitor.

Figure 9.22 Indicating low BP.

Figure 9.23 Normal BP.

Figure 9.24 High BP.

Figure 9.25 LCD display [85].

Chapter 10

Figure 10.1 Fitness interface of (a) teenage group, (b) youth group, and (c) m...

Figure 10.2 Dataset collected from the user using Google Forms.

Figure 10.3 Output graph of training and validation.

Figure 10.4 Gender graph.

Figure 10.5 Age group graph.

Figure 10.6 Count of predicted and actual BMI.

Figure 10.7 Actual BMI vs. predicted BMI.

Chapter 11

Figure 11.1 Proposed flow chart for kidney disease diagnosis.

Figure 11.2 Precision–recall and ROC curves depicting the diagnostic performan...

Figure 11.3 A bar chart visually represents the comparison of various machine ...

Chapter 12

Figure 12.1 LSTM structure.

Figure 12.2 Genetic algorithm lifecycle.

Figure 12.3 Stages of the framework for building element of decision forecasti...

Figure 12.4 Gene settings in GA.

Figure 12.5 CV RMSE for the different variants of LSTM for total number of dea...

Figure 12.6 GA-optimized LSTM predictions for day 14 for the total number of d...

Figure 12.7 GA-optimized LSTM predictions for day 14 for the total number of d...

Figure 12.8 Benchmark model results compared with actual values.

Figure 12.9 GA-optimized LSTM predictions for day 14 for the total number of d...

Figure 12.10 GA-optimized LSTM predictions for day 14 for the total number of ...

Figure 12.11 GA-optimized LSTM predictions for day 14 for the total number of ...

Figure 12.12 GA-optimized LSTM predictions for day 14 for the total number of ...

Figure 12.13 GA-optimized LSTM predictions for day 14 for the total number of ...

Figure 12.14 GA-optimized LSTM predictions for day 14 for the total number of ...

Figure 12.15 GA-optimized LSTM predictions for day 14 for the total number of ...

Figure 12.16 GA-optimized LSTM predictions for day 14 for the total number of ...

Figure 12.17 GA-optimized LSTM predictions for day 14 for the total number of ...

Figure 12.18 GA-optimized LSTM predictions for day 14 for the total number of ...

Chapter 13

Figure 13.1 Sentiment analysis.

Figure 13.2 Analysis of sentiment based on collaborative filtering.

Figure 13.3 Collaborative filtering.

Figure 13.4 HCF-based recommender system.

Figure 13.5 Comparison of various sentiment analysis techniques.

Chapter 14

Figure 14.1 Structure of the privacy revolution of federated learning.

Figure 14.2 Architecture of proposed AI-powered health management with privacy...

Figure 14.3 Adaptive AI framework.

Figure 14.4 Differential privacy framework.

Figure 14.5 Verifiable credentials and blockchain integration.

Figure 14.6 Federated learning auditing mechanism.

Figure 14.7 Model accuracy.

Figure 14.8 Performance evaluation across existing systems.

Chapter 15

Figure 15.1 Closed-loop diabetes control system.

Figure 15.2 SMBG glucose monitor.

Guide

Cover Page

Table of Contents

Series Page

Title Page

Preface

Begin Reading

Index

WILEY END USER LICENSE AGREEMENT

Pages

iii

xvi

100

101

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

408

409

410

411

412

413

414

415

416

417

418

419

421

422

423

424

425

Scrivener Publishing100 Cummings Center, Suite 541JBeverly, MA 01915-6106

Machine Learning in Biomedical Science and Healthcare Informatics

Series Editors: Vishal Jain ([email protected])and Jyotir Moy Chatterjee ([email protected])

In this series, an attempt has been made to capture the scope of various applications of machine learning in the biomedical engineering and healthcare fields, with a special emphasis on the most representative machine learning techniques, namely deep learning-based approaches. Machine learning tasks are typically classified into two broad categories depending on whether there is a learning ‘label’ or ‘feedback’ available to a learning system: supervised learning and unsupervised learning. This series also introduces various types of machine learning tasks in the biomedical engineering field from classification (supervised learning) to clustering (unsupervised learning). The objective of the series is to compile all aspects of biomedical science and healthcare informatics, from fundamental principles to current advanced concepts.

Publishers at ScrivenerMartin Scrivener ([email protected])Phillip Carmical ([email protected])

Wellness Management Powered by AI Technologies

Edited by

Bharat Bhushan

School of Engineering and Technology, Sharda University, Greater Noida, India

Akib Khanday

Dept. of Computer Science and Software Engineering, United Arab Emirates University, UAE

Department of Computer Science, Samarkand International University of Technology, Samarkand, Uzbekistan

Khursheed Aurangzeb

College of Computer and Information Sciences, King Saud University, Riyadh, Kingdom of Saudi Arabia

Sudhir Kumar Sharma

KIET Group of Institutions, Delhi-NCR, Ghaziabad, India

and

Parma Nand

School of Engineering Technology, Sharda University, Greater Noida, India

This edition first published 2025 by John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA and Scrivener Publishing LLC, 100 Cummings Center, Suite 541J, Beverly, MA 01915, USA© 2025 Scrivener Publishing LLCFor more information about Scrivener publications please visit www.scrivenerpublishing.com.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.

Wiley Global Headquarters111 River Street, Hoboken, NJ 07030, USA

For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.

Limit of Liability/Disclaimer of WarrantyWhile the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchant-ability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials, or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read.

Library of Congress Cataloging-in-Publication Data

ISBN 978-1-394-28699-7

Front cover images courtesy of Adobe FireflyCover design by Russell Richardson

Preface

Machine Learning (ML) and the Internet of Things (IoT) offer powerful applications and solutions when integrated. ML and IoT have become essential components of the healthcare industry, with hospitals, clinics, and healthcare providers adopting ML-powered diagnostic tools, wearable devices, and IoT-enabled patient monitoring systems. This adoption aims to improve patient care, reduce healthcare costs, and enhance service quality. ML and IoT are key growth areas within the technology sector, with tech companies, startups, and research institutions actively developing innovative solutions that leverage ML algorithms and IoT connectivity to address healthcare challenges.

This book explores how technology has become an essential component of modern healthcare solutions, from wearable smart devices that track vital signs in real time to machine learning-driven diagnostic tools that generate precise forecasts. IoT devices produce vast amounts of data from sensors and connected systems. By applying machine learning algorithms to this data, patterns can be identified that predict when a machine is likely to fail, enabling predictive maintenance, reducing downtime, and optimizing maintenance schedules.

IoT devices monitor patients' health metrics, while ML models analyze this data to detect abnormalities and alert healthcare professionals. Together, ML and IoT have optimized healthcare processes and contributed to proactive wellness management. They demonstrate how smart wearable devices, with real-time vital sign tracking, and ML-driven diagnostic tools, capable of highly accurate predictions, are reshaping the landscape of healthcare solutions. The convergence of ML and IoT has also led to innovative solutions in remote patient monitoring, early disease detection, mental health support, and personalized wellness plans.

We are grateful to the contributing authors for their dedication and expertise, and we extend our thanks to the reviewers who have provided invaluable feedback throughout the preparation of this volume. Finally, we thank Martin Scrivener and Scrivener Publishing for their support and publication.

The EditorsOctober 2024

1Exploring Functional Modules Using Co-Clustering of Protein Interaction Networks

R. Gowri1* and R. Rathipriya2†

1Department of Computer Science, AVS College of Arts and Science (Autonomous), Salem, Tamil Nadu, India

2Department of Computer Science, Periyar University, Salem, Tamil Nadu, India

Abstract

This chapter introduces the new score-based co-clustering (SCoC) method for functional module mining (FMM) in protein interaction networks (PINs). This strategy focuses on the drawbacks of previous approaches, including computational overhead, time consumption, and a disregard for quality and overlapping modules. This chapter has proposed two revised versions of the SCoC method: MR-CoC and SCoC rand. Artificial datasets are utilized to evaluate these suggested methods’ performances. These datasets are created with the intention of imposing certain criteria, such as distributed co-cluster, matrix type, noise, and data size. This chapter discusses how these suggested ways are being implemented. Additionally, the MR-CoC was used for functional module mining in the protein interaction networks of humans. To analyze the efficiency of MR-CoC, its results are compared with those of existing protein complexes. The biological implications of these findings have been further examined.

Keywords: MapReduce, protein modules, functional modules, functional coherence, key-value pairs, molecular function

1.1 Introduction

The major problem focused on in this research work is the functional module mining in a protein interaction network (PIN). Currently, the functional modules are identified based on laboratory experiments. This process involves the selection of the initial candidates and elimination/addition of the participants to this module based on the lab experiments and analysis. The choice of the initial candidates for the functional module is made manually by the biologists using specific tools based on their requirements. Finding such candidates from the complex PIN is a tedious task to the biologists [1]. This current research is aimed to propose a computational solution to this problem as shown in Figure 1.1.

Currently, the availability of biological networks is increasing due to technological developments in bioinformatics [2]. These networks are highly useful in the medical field for studying and analyzing the behaviors and functionality of various pathogens within the host organisms [3]. They are used for early detection of disease, disease diagnosis, prognosis, drug discovery, drug target identification, and so on.

They are also used to study the functionalities of various organisms, especially in their PINs, which are used to communicate signals or information within the biological system [4]. The proteins are connected with each other to form a PIN. The protein complexes or functional modules are the groups of proteins densely connected to perform a specific biological process.

Figure 1.1 Workflow of the current research.

The major significances [5, 6] of the functional module mining are as follows:

The central nodes of the biological networks are vulnerable to the targeted attacks by dangerous diseases.

Predicting key targets for tackling glioma (malignant tumor) drug resistance.

The network modularity correlated with cancer (diseased) patient survivability.

The pathogen infection (say cancer) tend to be enriched in particular network modules.

Finding the hallmark network modules activated predominantly in each tumor is achieved by the identification of significant network modules.

Neurodegenerative diseases (e.g., Parkinson’s disease) are due to many dysregulated physiological processes, which are identified from the abnormalities in molecular networks.

Identification of therapeutic targets (drug target-based functional modules) in complex disorders is a major challenge in developing effective therapies for complex diseases.

In the literature, various approaches exist for functional module (sub-networks) identification. Some approaches use graph theoretical concepts to identify these sub-networks but face issues like scalability, ignorance of overlapped sub-networks, time consumption, and computational overhead [41–44]. From the related works, it has also been found that such approaches do not consider any functional features for identifying the functional modules.

The objective of this chapter is to overcome these issues and also extract the functional modules based on their functional density measures using data-mining techniques. The score-based co-clustering with MapReduce (MR-CoC) approach is experimented with on the PIN of Homo sapiens, and the results are compared with the existing protein complexes. This approach mines the existing protein complexes efficiently. Then, the biological significances of the unknown modules are analyzed.

Biologists can use this approach for finding novel functional modules from any PIN. It reduces the complexity of manual extraction of functional modules and can be used for predicting the new drug targets for various diseases. These resultant modules can be further tested in laboratories for new functional module predictions.

This chapter is further organized as follows: Section 1.2 discusses the related research works of functional module mining and binary co-clustering approaches. Sections 1.3 to 1.8 discuss the terminologies, existing methods, datasets, experimental environment, validation measures, and biological significances. Section 1.9 presents the proposed method MR-CoC and its enhancements based on the comparative analysis. Section 1.10 elaborates the functional module mining using MR-CoC based on a comparative analysis of the results under different experimental setups, and analysis of experimental results for their biological significance. Section 1.11 summarizes the entire work carried out in this chapter.

1.2 Related Works

The functional module mining approaches in the literature show that most are performed using graph theoretical algorithms and are based on the topological properties. The various related research articles in the literature are listed in Table 1.1. They focus on MCODE, edge sampling, clustering, and optimization techniques for functional module mining. Table 1.1 highlights the approaches, measures, and various related issues (scalability, time consumption, computational overhead, functional measures, etc.). The tick mark and cross mark in Table 1.1 represent the presence and absence of the specified issue.

From the study, MCODE is the pioneer approach for protein complex detection. It detects the cliques from PIN based on the network density. The complexity of the MCODE is O(n3).

The PIN is represented using the adjacency matrix, which is a binary symmetric matrix. Thus, binary co-clustering approach is proposed in this chapter. The related works of binary co-clustering are also studied. The existing co-clustering approaches for binary data are reviewed in this section. Cheng and Church [10], Plaid [11], OPSM [12], etc. are for numerical data matrices. These approaches are using the distance measures that do not suit the binary datasets. There are some specific approaches like xMotif [13], BiMax [14], BicBin [15], BicSim [16], BMF [17], BiBit [18], BBK [19], BitTable [20], BiBinCons & BiBinAlter [21] and ParBiBit [22] that were developed at different time periods for co-clustering the binary data. The detailed reviews of these existing approaches are presented in Table 1.2. Every existing approach is reviewed based on various criteria such as method, binarization, overlapped co-clusters, scalability issue, computational overhead, parameter tuning, time consumption, parallelization, noise sensitivity, and remarks about the approach.

Table 1.1 Literature study for functional module mining.

Title

Algorithm

Measure

Scalability issue

Ignore overlapped modules

Time consumption

Computational overhead

Functional features usage

An automated method for finding molecular complexes in large protein interaction networks

[7]

MCODE

Network density

✓

Detection of functional modules from protein interaction networks

[4]

Clustering

Classification score

✓

Identifying functional modules in protein–protein interaction networks: an integrated exact approach

[8]

Mathematical optimization

Modular scoring function

✓

Protein interaction networks—more than mere modules

[1]

Block method based on GO terms

Error minimization

✓

Weighted consensus clustering for identifying functional modules in protein–protein interaction networks

[9]

Combines 4 clustering algorithms

Cluster coefficient

✓

BicNET: Flexible module discovery in large-scale biological networks using biclustering

[2]

Biclustering

Biclusters

✓

From Table 1.2, the xMotif and ParBiBit are greedy-type algorithms; BiMax and BBK are “divide and conquer”-type approaches; the BiSim and BiBinCons are iterative approaches; BicBin is a model-based approach; BiBit is an exhaustive enumeration approach; BitTable is an a priori-type approach. All the existing approaches discussed here are binarizing the numerical data for its processing. The xMotif, BiMax, and BBK algorithms ignore the overlapped co-clusters in the dataset. All these existing approaches are having computational overhead and scalability issues. The parameter tuning is necessary for most of the approaches except the xMotif approach in Table 1.2 for better performance. The BicBin is a parallelized approach; thus, the time consumption is less when compared to other existing approaches. The noise in the dataset affects the efficiency of co-clustering, which results in ignorance of the quality co-clusters in the dataset. The BicBin, BMF, and BiBinCons are noise-insensitive approaches in the related research works.

From the study, it is clear that the BiMax and xMotif approaches are used for comparative analysis in most of these existing methods. There is no benchmark algorithm for co-clustering.

Most of these existing approaches failed to extract all the embedded co-clusters from the given data. The limitations of the existing approaches are missing the overlapped co-clusters and quality co-clusters, scalability issue, computational overhead, and more time consumption.

Table 1.2 A review on existing approaches for binary co-clustering.

S. no.

Existing approach

Method

Binarization

Overlapped co-clusters

Scalability Issue

Computational overhead

Parameter tuning

Time consumption

Parallelization

Noise sensitivity

Remarks

xMotif (2003)

Greedy

✓

As it is a greedy approach, it may lose good co-clusters and may take wrong decisionsLeukemia gene expression analysis

BiMax (2006)

Divide and conquer

✓

Benchmark approachBest for limited size datasetExtracts only pure constant maximal BiclustersGene Expression Data Analysis

BicBin (2008)

Model based

✓

Uses cost function to evaluate the sub-matrices

BiSim (2009)

Iterative

✓

Used for Gene Expression Analysis

BMF (2010)

Matrix multiplication

✓

Quality of co-clusters depends on the selection of the discretization methodFixing the number of factors ‘k’ is a big hurdlePerforms dimensionality reduction of data

BiBit (2011)

Exhaustive enumeration

✓

Uses Boolean Algebraic operationsSearches co-clusters in all possible combinationsSuits for limited data sizeEmbryonic Tumor Gene expression analysis

BBK (2012)

Divide and conquer

✓

Uses Bron-Kerbosch backtracking approach to improvise BiMax approachStill faces the listed issues

Bit-Table (2014)

A priori

✓

Too many candidate itemsetsScans data once for each itemsetHigh memory consumptionUses various Bit Table Operations

BiBinCons BiBinAlter (2015)

Iterative

✓

It performs an exhaustive search for bicluster in each row and column combinationMicroarray Data Analysis

10.

ParBiBit (2018)

Greedy

✓

Utilizes the modern distributed memory systems efficientlyAs it is a greedy method likely to take wrong decisions

1.3 Basic Terminologies

The basic terminologies used in this research are categorized under biological terminologies and technical terminologies.

1.3.1 Scientific Terms Used

In this research, the scientific terminologies stand for a variety of topics that are biologically related. This research section discusses numerous terminologies to enhance understanding of the bioinformatics concept. They are as follows:

Protein

: Proteins are composed of a linear chain of 22 amino acids, known as a sequence of amino acids. The data consist of one-dimensional arrays represented as lengthy strings of letters [

] as shown in

Figure 1.2

. They are the genetic sequences representing a gene or a portion of a gene. It is utilized to depict the operations of biological systems. They oversee a range of physiological and biochemical processes within the cell. The central dogma illustrates the protein route, as depicted in

Figure 1.2

Molecular Networks

: Molecular networks refer to the interconnected relationships among biological components such as genes, RNA, and proteins. They are utilized to symbolize the relationship, connectedness, and communication among these products. They are created through chemical reactions within the cell [

Protein Interaction Networks (PINs)

: Proteins construct molecular networks. They are utilized to emphasize the connections of diverse proteins in various cellular compartments. These networks are utilized for transmitting signals and commands to different parts of the system.

Figure 1.3

shows the sample PIN [

Protein Modules

: Protein modules are a set of proteins that carry out specific functions within the cell. It is a component of a protein interaction network that has stronger connections with each other compared to others

[24]

. The portions of the PIN are very coherent.

Figure 1.4

emphasizes the protein modules in the sample

[24]

Protein Complex vs. Functional Modules

: These two are protein modules. The protein modules are stationary, meaning they remain in a fixed position consistently. The functional modules are dynamic and can be generated at any location and time, not remaining fixed in one place consistently. Functional modules emerge as a result of pathogen infections and protein abnormalities. Functional modules are utilized to forecast the existence of pathogen infections, diseases, and anomalies, and are employed in drug and drug target discovery, therapy recommendations, and treatment development for diverse complicated disorders [

Drug Targets

: They are proteins that facilitate drug delivery to infected areas, enhance therapeutic effectiveness, and interact with the drug to improve treatment

[26]

Figure 1.2 Central dogma of proteins [25].

Figure 1.3 Sample protein interaction network.

Figure 1.4 Sample protein modules.

1.4 Existing Methods

The existing methods used for comparative analysis in various stages of this research are the binary co-clustering approaches (BiMax and x-Motif), and optimization approaches (PSO, GA, and Firefly) are discussed in detail.

Data are “co-clustered” when its rows and columns are grouped at the same time, as seen in Figure 1.5. It identifies and isolates the specific patterns present in the data within a certain region or area. It refers to a distinct collection of rows and columns that exhibit greater similarity, as shown in Figure 1.5.

Figure 1.5 Co-clustering: A toy example.

1.4.1 Binary Co-Clustering Approaches

This study work involves a comparative investigation of the proposed binary co-clustering strategy using Binary inclusion-Maximal [14] and x-Motif [13] algorithms.

1.4.1.1 Binary Inclusion-Maximal Algorithm

Prelic et al. developed the Binary inclusion-Maximal (BiMax) co-clustering algorithm in 2006 for finding the maximal co-clusters from the binary data matrix. This algorithm is the pioneer and is used for comparative analysis for most of the co-clustering approaches. The pseudocode of BiMax is in Algorithm 1.1. This approach has an O(n2m2α) complexity, where the n, m and α are the count of genes, conditions, and inclusion-maximal co-clusters.

Figure 1.6 Sample illustration of BiMax algorithm.

Algorithm 1.1: BiMax

Input: Input Data Matrix

Output: Resulting Sub-Matrices

Step 1 : Divide the columns into CU and CV (subsets),

Step 2 : Sort rows of E with the first row as a template

place all genes in G

, expressed to conditions only in C

place all genes in G

, expressed to conditions both in C

and C

place all genes in G

expressed to conditions only in C

Step 3 : Define the combination of genes GU, GW,, GV and conditions CU and CV

Step 4 : Recursively decompose the U, V sub-matrices using steps 1 to 4.

Figure 1.6 depicts the sample illustration of the BiMax approach. It is a reference-based divide-and-conquer approach used for binary co-clustering. It takes its first row as its reference to further group the data.

1.4.1.2 xMotif Algorithm

The xMotif algorithm is devised for gene expression data to extract conserved gene motifs [13]. It is also used for various co-clustering problems in the literature. It suits both real-valued and binary data co-clustering. The xMotif algorithm is the most frequently used co-clustering approach introduced by Murali and Kasif in 2003, but its usage is extending over a long period. The xMotif is also a co-cluster. The pseudocode of xMotif is in Algorithm 1.2. This algorithm finds the largest xMotif from the given data. In the algorithm, the “ns” and “nd” is the number of initial seeds and samples chosen from each seed, respectively, “sd” is the sample size, and “α” and “β” are the user-defined fractions of samples and genes conserved, respectively, in the samples chosen. It is less time-consuming when compared to other co-clustering approaches. Its time complexity is O(nm O(log (1/α)+ log (1/b))).

Algorithm 1.2: FindxMotif

Input: Input data matrix

Output: motif

Step 1 : Repeati = 1tons

Select a random sample

uniformly

Repeat

Select a subset

randomly of size

Include

(

)

for each row, if

∈

in c, and similarly for all samples in D

Assign all set of gene-states in G

that satisfy the c

discard (C

, G

), if lesser than α

samples present in C

Step 2 : return (C+,G+), with maximal |Gij|, 1 ≤ i ≤ ns, 1 ≤ j ≤ nd

1.5 About Dataset

This section deals with the datasets experimented in this research work. They are PIN, complexes of protein dataset, and targets dataset.

1.5.1 Protein Interaction Networks

It is a set of interconnections between the proteins; this interconnection is also called interactions. It represents a link connecting two proteins. Two interactors are provided for each record in the data to describe an interaction. These interactions are undirected in nature. The sample dataset records are given in Figure 1.7.

The sample dataset is the first ten records of Homo_Sapiens PIN. There are various columns corresponding to the interactors (A, B), official symbols of A and B, alias names of A and B, experimental system, PubMed identifier of the interaction, and organisms of A and B in the dataset. In this research, the interactor columns are used for experimentation; the other additional information can be used for analytical purposes. This study makes use of a PIN dataset that was obtained from the STRING database [27, 28].

Figure 1.7 Sample PIN dataset.

1.5.1.1 STRING Repository

The existing and tested protein interactions are present in this STRING repository [27, 28]. Physical and functional linkages are also a part of these interactions. There are ≅ 3.12 billion interactions involving ≅ 24.58 million proteins from 5090 different organisms. This database contains 19.4K Homo sapiens proteins and ≅ 8.5 million interactions.

The various data sources of this database are COG, BioGRID, MINT, KEGG, Gene Ontology, Ensembl, etc. [27]. The protein sequence data of these proteins were also taken from this database. Figure 1.8 shows the sample download page of the STRING database.

Figure 1.8 STRING database download page.

1.5.2 Protein Complex Dataset

Protein complexes are groups of proteins that work together to carry out particular cellular tasks in living organisms [24]. Different creatures have their own unique protein complexes. In order to compare the outcomes, this study makes use of this protein complex data. The CORUM database is used to retrieve this data.

1.5.2.1 CORUM Database

Mammalian protein complexes are exhaustively represented in this database [29]. It is composed of mammalian protein complexes that have been confirmed by experiments. There are 64% human, 16% mouse, and 12% rat complexes in this database. A total of 2,358 protein complexes from humans make it up. The name, composition, function, and reference of each protein complex are also included.

Figure 1.9 shows the statistics of the protein complexes in each release of the CORUM database [29]. The size of the protein complexes is from 1 to 64, but there are very few protein complexes from size 9 to 64, i.e., there are no complexes present in most of this size limit, and the maximal complexes present in the size limit of 1 to 8. As per the literature, the protein complex sizes and their counts are shown in Figure 1.10. Based on these sizes, only 2358 protein complexes taken for this research are in the size limit 3 to 8. The protein complex sizes 1 and 2 are not meant to be chosen because size 1 represents a single protein and size 2 describes an interaction. Thus, selecting these two sizes will increase the time complexity. Instead, the individual protein and its interaction in PIN can be analyzed separately. Hence, the size limit 3 to 8 is taken. This limit is fixed on a trial-and-error basis. For the maximal sizes, more seeds and lengthy seeds have to be generated. Based on the size of most of the existing protein complexes, size 8 is fixed for this research work. Further, it can be increased in the future to enhance this research.

Figure 1.9 CORUM database statistics [29].

Figure 1.10 Protein complex statistics.

1.6 Experimental Environment

All the implementations of this research and their experiments are carried out in the MATLAB 2016a environment. In this research, the co-clustering approach proposed for Functional Module Mining uses the MapReduce (MR) framework. The basic configuration of MR Framework of the MATLAB [30] is used in this research work. This default MapReduce framework runs on the virtual distributed environmental setup. The distributed Hadoop cluster can also be configured in MATLAB. This MapReduce Framework is discussed in detail further.

1.6.1 MapReduce Framework

The MapReduce framework is for the parallel processing of the voluminous distributed data [31–33]. Data are processed using key-value (KV) combinations. This framework consists of three different phases: map phase, rearranging or grouping (intermediate) phase, and reduce phase. Out of these phases, the Map and Reduce phases are the user-defined phases that consist of user-defined functions to process the data. The Map function works on the data blocks and produces KV pairs; the reduce function works on each unique key separately and returns KV pairs. The intermediate phase is for grouping them based on their keys, which are default processing in this framework.

Figure 1.11 Workflow of MapReduce in MATLAB [30].

The MapReduce Framework, as per the MATLAB [30], is in Figure 1.11. The inputs and outputs of this framework are in the form of a data store. Tables of data, KV pairs, text files, images, and other types of media can all be input. The key-value datastore will be the final product. The map function will take the data blocks from this input data store for processing and produce the intermediate KV pairs in the form of intermediate KV store. In the intermediate phase, a value iterator will be generated by default for each key, as in Figure 1.11. This phase is for grouping the intermediate KV pairs based on the unique keys. After this phase, the user-defined “Reduce” function will be invoked for each unique key for further processing. It will generate the output KV pairs.

1.7 Validation Measures

In this section, the various validation measures such as Match Score, Network Metrics, and Functional Coherence are used for evaluating the performances of the proposed work.

1.7.1 Match Score Measure

The match score measure is used for evaluating the proportion of matching between two patterns of data or two sets of patterns [14]. The patterns are co-clusters. It is evaluated using Equations 1.1 and 1.2.

The local patterns formed by grouping rows and columns simultaneously are called co-clusters. The scr(B1, B2) is the match score between the co-clusters, which is evaluated using Equation 1.1. Specifically, it is the ratio of the total number of unique row and column identifiers among co-clusters to the total number of common identifiers among them. The “B1” and “B2” represent the two co-clusters; one is the target co-cluster, and another is the output or predicted co-cluster. The {I1, J1}, and {I2, J2} represent the row and column identifier sets of the co-clusters B1 and B2, respectively.

(1.1)

The Scr*B(M1, M2) is the match score between the two co-cluster sets, as in Equation 1.2. It is the maximal value of the match score averages of all co-clusters in “M1” with “M2”. “M1” is the target co-cluster set and “M2” is the output co-cluster set.

(1.2)

The match score measure is used in this research for evaluating the match between the existing and extracted patterns for their studying their accuracy.

1.7.2 Functional Coherence

In order to assess how well the protein module’s functional annotations hold together, this metric is used [34]. Each protein is accompanied by biological functional characteristics known as functional annotations. Under several facets, it depicts their functions within the biological system. To predict their properties, we quantify these functional annotations for every protein module. To find out how similar these functional annotations are, we use a metric called functional coherence (FC). It is the percentage of proteins in the protein module that have a functional annotation relative to the total number of proteins.

(1.3)

From the ith protein module (PM), we may derive the functional coherence (FC) of the jth annotation using Equation 1.3. pij denotes the proteins in the ith module that have the jth annotation, while pi stands for the proteins in the ith module itself. If all the proteins in the module have the jth annotation, then the value of FCj is “1”; otherwise, it is “0”. This FC value describes the protein module’s functioning. In this study, the functions of the resulting protein modules are determined using this metric.

1.8 Biological Significances

To illustrate the proteins’ biological importance, functional annotations are used. When it comes to characterizing the functional features of molecular products like proteins, functional annotations are the gold standard [35, 36]. The three primary groups into which these implications fall are as follows:

Molecular Function

: These include molecular-level processes like binding and catalysis, among others. Depending on where they are situated and circumstances, they might be caused by a single bio-product (protein) or a set of bio-products (protein complex). These inscriptions exclusively depict the bio-product activity, not the mixture of bio-products that cause this activity [

Biological Process

: It stands for the cellular-level metabolic activity, transportation, etc. that are essential to the bio-products’ biological processes. The execution of a biological action in the cell is accomplished by combining one or more molecular processes. A bio-product or products carry them out [

Cell Component

: The bio-product’s relative position or cellular compartments during molecular activities are represented by it. Several factors, such as their respective cellular topologies, cellular compartments, and the presence of stable macromolecular complexes, are usually considered when making these determinations [

1.9 Proposed Co-Clustering Approach: MR-CoC

1.9.1 SCoC for Non-Symmetric Matrix

In this section, the SCoC approach for the non-symmetric matrix called SCoCnsym is focused on, which is an extended version of the previous SCoC approach. The non-symmetric matrix is either a rectangular matrix or square matrix where the upper-right and lower-left triangular matrices are not the same. It is used to represent the relationship between two different disjoint sets of objects. The SCoCnsym is performed on one dimension at a time either on a row or column based on the condition specified. The score threshold is the major part of mining different types of co-clusters based on the requirement. Algorithm 1.3 shows the steps of SCoCnsym.

Algorithm 1.3: SCoCnsym

Input: 2-D Input Data (D)

Result: co-cluster (C)

Step 1 : C=D, compute Sn (score) of C

Step 2 : if Sn < threshold then

compute scores of C (rows and columns)

ignore low score row or column in C

compute S

Step 3 : return C

Initially, input data matrix D is imputed with noise and non-symmetric co-cluster in it. The score value of the co-cluster is computed at each iteration of the process; ignore the rows and columns with low score at each iteration and update the score value using the equation; continue this process until the score value is less than the selected threshold. This algorithm is for mining the constant (1’s) co-cluster from a given data matrix where the threshold is “1”. For the 0’s co-cluster, the score threshold should be “0” and has to remove the row or column with a high score value. This approach extracts the maximal constant co-cluster from the given binary data matrix. O(ne+nv2) is the temporal complexity of this approach.

1.9.1.1 Toy Example: SCoCnsym

The sample illustration of SCoCnsym for mining 1’s constant co-cluster is explained in this section. Let “C” be the Input Matrix, where the embedded co-cluster is highlighted in Figure 1.12. The score threshold is “1”.

Figure 1.12 Sample input matrix and its heatmap.

Table 1.3 illustrates a sample for the proposed SCoCnsym approach. Here the 7 × 3 constant 1’s co-cluster is embedded in the 8 × 7 noisy data matrix. In every iteration, the row score and column score of the given data are computed; either the row or column with a minimal score is removed; the score is evaluated for the result matrix. This approach is also attempted on the synthetic datasets for their performance analysis similar to the previous implementation, which is discussed in the next section.

Table 1.3 SCoCnsym: Toy example.

1.9.1.2 Synthetic Dataset Description

The synthetic binary datasets are generated for experimenting with the SCoCnsym. In this section, four different binary datasets are generated, where the co-clusters and noise are implanted.

Figure 1.13 Synthetic Dataset_nsym_I. (a) Noiseless; (b) noisy.

SCoCnsym is experimented on both the noisy and noiseless data. The dataset description is shown in Table 1.4, which highlights the matrix type, data size, co-cluster size, co-cluster position, presence, and nature of noise.

1.9.1.3 Experimental Analysis: SCoCnsym

The proposed approach SCoCnsym is attempted on four different synthetic non-symmetric datasets with a score threshold of “1” for mining constant 1’s co-cluster. The MATLAB environment is used for experimenting this research work. The comparative analysis of the performance of the SCoC nsym approach is carried out. These results are given in Table 1.5, which shows that the proposed approach SCoCnsym outperforms the existing approaches. The SCoCnsym approach extracted the implanted co-clusters from all different types of synthetic data under the noisy space. The BiMax algorithm can extract the implanted co-clusters only in the datasets without noise. The performance of the existing BiMax approach is affected by the noise in the data.

Figure 1.14 Synthetic Dataset_nsym_II. (a) Noiseless; (b) noisy.

The computational time taken for mining the co-clusters in these four synthetic datasets by these approaches is also recorded in Table 1.6. The outcomes show that the performance of the xMotif approach is better than other approaches, but it does not extract the expected co-cluster from any dataset.

Figure 1.15 Heatmap of synthetic Dataset_nsym_III. (a) Noiseless; (b) noisy.

In all these four synthetic datasets, the expected co-clusters, i.e., the implanted co-clusters, are the maximal co-cluster in them. The BiMax algorithm can mine the co-clusters in minimal time when compared to the proposed approach. Even though the BiMax algorithm consumes less time, its performance is highly affected by the noise.

1.9.2 Randomized SCoC

The previous two approaches mine the maximal co-cluster present in the dataset, whereas the remaining patterns hidden are not explored. In this case, the randomization of this approach makes it to mine co-clusters hidden in the different portions of the dataset, i.e., the smaller co-clusters are ignored. It is implemented by introducing the random seed vector that represents the search location within the dataset. Many random seeds are generated to initiate different searches in the dataset. It results in exploring the different patterns present in the data. Also, the entire dataset has to be traversed for co-cluster mining, which is a time-consuming process for the large dataset. In such cases, the randomization will be the better solution to mine the patterns in the dataset stochastically. For these reasons, the randomization of the SCoC approach is proposed.

Figure 1.16 Heatmap of synthetic Dataset_nsym_IV. (a) Noiseless; (b) noisy.

In this section, the implementation of the SCoCrand is discussed. Here, three different seeding techniques are adopted for testing this proposed approach. They are row-wise random seeding, column-wise random seeding, and random seeding (both row and column seed). A synthetic dataset is generated and tested with these types of seeding techniques discussed further.

This SCoCrand approach uses the SCoC approach, which is applied to each seeding portion. The pseudocode of the SCoCrand is shown in Algorithm 1.4.

Table 1.4 Synthetic dataset description—SCoCnsym.

S. no.

Dataset name

Size

Matrix type

Co-cluster size

Co-cluster position

Noise type

Dataset_ nsym_I (

Figure 1.13

)

500 × 700

Non-symmetric binary

150 × 190

Randomly scattered and symmetrical

Random symmetric noise

Dataset_ nsym_ II (

Figure 1.14

)

200 × 400

Non-symmetric binary

150 × 190

Specific continuous

Symmetric random noise

Dataset_ nsym_ III (

Figure 1.15

)

100 × 200

Non-symmetric binary

50 × 50

Randomly symmetrical

Random noise

Dataset_ nsym_ IV (

Figure 1.16

)

400 × 200

Non-symmetric binary

200 × 150

Randomly scattered and symmetrical

Symmetric random noise

Table 1.5 Comparative analysis of SCoCnsym based on match score measure.

Dataset_nsym I

Dataset_nsym II

Dataset_nsym III

Dataset_nsym IV

Noise-free

Noisy

Noise-free

Noisy

Noise-free

Noisy

Noise-free

Noisy

SCoC

nsym

BiMax

0.4319

0.0117

0.0278

0.5719

xMotif

0.1075

0.1508

0.2500

0.1709

0.1667

0.0397

0.1867

0.0083

Table 1.6 Comparative analysis of SCoCnsym based on computational time.

Dataset_nsym I

Dataset_nsym II

Dataset_nsym III

Dataset_nsym IV

Noise-free

Noisy

Noise-free

Noisy

Noise-free

Noisy

Noise-free

Noisy

SCoC

nsym

1.44029

6.60305

0.83674

0.93903

0.11604

0.35163

0.68721

1.14249

BiMax

9.48878

0.01153

0.00743

0.01251

0.00197

0.00166

0.01251

0.00317

xMotif

0.98624

0.03936

0.01348

0.01785

0.01009

0.03381

0.03746

0.03430

Algorithm 1.4: SCoCrand

Input: 2-D Input Data (D)

Result: co-cluster (M = {C1, C2,…Cns})

Step 1 : Assign the seed count (ns) and seed size (lens)

Step 2 : Generate the initial seeds

Step 3 : For every seed si

Perform SCoC

nsym

to extract co-cluster

Step 4 : Assess and visualize the resultant co-clusters (M).

A seed vector will be the set of identifiers to represent the particular subset of the dataset. The number of seeds (ns) and seed vector size (lens) can be fixed based on the user requirement. It can also be set on a trial-and-error basis. In this SCoCrand, instead of finding one co-cluster, more co-clusters can be extracted. The different sub-matrix in the input data is selected using random seeds and co-clusters are extracted from each seed by applying the SCoCnsym to each seed. This can be used to extract multiple co-clusters from the given data. The temporal complexity in terms of “O” for this approach is Tr = O(Ns(max(Nes)+ Nvs2)), where “Ns”, “Nes”, and “Nvs” are the seed count, edges, and vertices in a seed, respectively.

1.9.2.1 Synthetic Dataset Description

In this approach, the seeds are chosen to mine more patterns from the given data matrix. It helps to mine the maximal co-cluster in each sub-matrix of the data. They are selected based on the random seeds. The seeds are generated at different criteria like row-wise random seeds, column-wise random seeds, and random seeds (both row and column). The row-wise random seed splits the data along the row (row subset); similarly, the column-wise

Tausende von E-Books und Hörbücher

Ihre Zahl wächst ständig und Sie haben eine Fixpreisgarantie.

Sie haben über uns geschrieben: