Supervised and Unsupervised Data Engineering for Multimedia Data -  - E-Book

Supervised and Unsupervised Data Engineering for Multimedia Data E-Book

0,0
146,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

SUPERVISED and UNSUPERVISED DATA ENGINEERING for MULTIMEDIA DATA Explore the cutting-edge realms of data engineering in multimedia with Supervised and Unsupervised Data Engineering for Multimedia Data, where expert contributors delve into innovative methodologies, offering invaluable insights to empower both novices and seasoned professionals in mastering the art of manipulating multimedia data with precision and efficiency. Supervised and Unsupervised Data Engineering for Multimedia Data presents a groundbreaking exploration into the intricacies of handling multimedia data through the lenses of both supervised and unsupervised data engineering. Authored by a team of accomplished experts in the field, this comprehensive volume serves as a go-to resource for data scientists, computer scientists, and researchers seeking a profound understanding of cutting-edge methodologies. The book seamlessly integrates theoretical foundations with practical applications, offering a cohesive framework for navigating the complexities of multimedia data. Readers will delve into a spectrum of topics, including artificial intelligence, machine learning, and data analysis, all tailored to the challenges and opportunities presented by multimedia datasets. From foundational principles to advanced techniques, each chapter provides valuable insights, making this book an essential guide for academia and industry professionals alike. Whether you're a seasoned practitioner or a newcomer to the field, Supervised and Unsupervised Data Engineering for Multimedia Data illuminates the path toward mastery in manipulating and extracting meaningful insights from multimedia data in the modern age.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 457

Veröffentlichungsjahr: 2024

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.


Ähnliche


Table of Contents

Cover

Table of Contents

Series Page

Title Page

Copyright Page

Dedication

Book Description

List of Figures

List of Tables

Preface

1 SLRRT: Sign Language Recognition in Real Time

1.1 Introduction

1.2 Literature Survey

1.3 Model for Sign Recognition Language

1.4 Experimentation

1.5 Methodology

1.6 Experimentation Results

1.7 Conclusion

Future Scope

References

2 Unsupervised/Supervised Feature Extraction and Feature Selection for Multimedia Data (Feature extraction with feature selection for Image Forgery Detection)

2.1 Introduction

2.2 Problem Definition

2.3 Proposed Methodology

2.4 Experimentation and Results

2.5 Feature Selection & Pre-Trained CNN Models Description

2.6 BAT ELM Optimization Results

Conclusion

Declarations

Consent for Publication

Conflict of Interest

Acknowledgement

References

3 Multimedia Data in Healthcare System

3.1 Introduction

3.2 Recent Trends in Multimedia Marketing

3.3 Challenges in Multimedia

3.4 Opportunities in Multimedia

3.5 Data Visualization in Healthcare

3.6 Machine Learning and its Types

3.7 Health Monitoring and Management System Using Machine Learning Techniques

3.8 Health Monitoring Using K-Prototype Clustering Methods

3.9 AI-Based Robotics in E-Healthcare Applications Based on Multimedia Data

3.10 Future of AI in Health Care

3.11 Emerging Trends in Multimedia Systems

3.12 Discussion

References

4 Automotive Vehicle Data Security Service in IoT Using ACO Algorithm

Introduction

Literature Survey

System Design

Result and Discussion

Conclusion

References

5 Unsupervised/Supervised Algorithms for Multimedia Data in Smart Agriculture

5.1 Introduction

5.2 Background

5.3 Applications of Machine Learning Algorithms in Agriculture

References

6 Secure Medical Image Transmission Using 2-D Tent Cascade Logistic Map

6.1 Introduction

6.2 Medical Image Encryption Using 2D Tent and Logistic Chaotic Function

6.3 Simulation Results and Discussion

6.4 Conclusion

Acknowledgement

References

7 Personalized Multi-User-Based Movie and Video Recommender System: A Deep Learning Perspective

7.1 Introduction

7.2 Literature Survey on Video and Movie Recommender Systems

7.3 Feature-Based Solutions for Movie and Video Recommender Systems

7.4 Fusing: EF – (Early Fusion) and LF – (Late Fusion)

7.5 Experimental Setup

7.6 Conclusions

References

8 Sensory Perception of Haptic Rendering in Surgical Simulation

Introduction

Methodology

Background Related Work

Application

Case Study

Future Scope

Result

Conclusion

Acknowledgement

References

9 Multimedia Data in Modern Education

Introduction to Multimedia

Traditional Learning Approaches

Applications of Multimedia in Education

Conclusion

References

10 Assessment of Adjusted and Normalized Mutual Information Variants for Band Selection in Hyperspectral Imagery

Introduction

Test Datasets

Methodology

Statistical Accuracy Investigations

Results and Discussion

Conclusion

References

11 A Python-Based Machine Learning Classification Approach for Healthcare Applications

Introduction

Methodology

Discussion

References

12 Supervised and Unsupervised Learning Techniques for Biometric Systems

Introduction

Various Biometric Techniques

Major Biometric-Based Problems from a Security Perspective

Supervised Learning Methods for Biometric System

Unsupervised Learning Methods for Biometric System

Conclusion

References

About the Editors

Index

Also of Interest

End User License Agreement

List of Tables

Chapter 1

Table 1.1 Accuracy and loss values per epochs.

Table 1.2 Experimental results of training and testing data for accuracy and l...

Chapter 2

Table 2.1 Different ML classifiers.

Table 2.2 Modified LBP variants (Seven wonders of LBP) and second-order statis...

Table 2.3 Schema of 1-5 database.

Table 2.4 Image forgery detection & recognition (original & forged).

Table 2.5 Accuracy for different methods.

Table 2.6 Accuracy for different methods.

Chapter 4

Table 4.1 Accuracy.

Table 4.2 Sensitivity.

Table 4.3 Specificity.

Table 4.4 Table of time consumption.

Chapter 7

Table 7.1 List of papers and their summary on CNN-based recommender system.

Table 7.2 Statistics of MovieLense 10M dataset.

Table 7.3 Different fusions functions performances with late fusion model.

Table 7.4 Multi-user interest performance analysis.

Table 7.5 Performance comparison with different deep learning model.

Chapter 8

Table 8.1 Standard deviation of running time in different resolutions.

Chapter 10

Table 10.1 Summary of the test datasets including the Indian Pines, Salinas, D...

Table 10.2 The different types of NMI and corresponding AMI variants according...

Table 10.3 Confusion matrix.

Table 10.4 Kappa coefficient values for the two cases used in strategic evalua...

Chapter 12

Table 12.1 Security perspective, properties, data sets, and success criteria c...

List of Illustrations

Chapter 1

Figure 1.1 Basic sign language for each alphabet known characters.

Figure 1.2 Block diagram of phases of sign language recognition.

Figure 1.3 A few samples of MNIST sign language dataset.

Figure 1.4 Initial vectorization of data.

Figure 1.5 Final vectorization of data.

Figure 1.6 Phases of binary class conversion.

Figure 1.7 Sequential model with added layers.

Figure 1.8 Image processing techniques and steps.

Figure 1.9 A basic convolution for feature learning and classification.

Figure 1.10 Vectorized data outcome.

Chapter 2

Figure 2.1 Copy move forgery attack (Rahul Dixit & Ruchira Naskar 2017).

Figure 2.2 Photomontage attack (Aseem Agarwala

et al.

, 2004).

Figure 2.3 Resizing attack (Wei-Ming Dong & Xiao-Peng Zhang, 2012).

Figure 2.4 Image splicing attack (Yun Q. Shi

et al.

, 2007).

Figure 2.5 Colorized image attack (Yuanfang Guo

et al.

, 2018).

Figure 2.6 Camera-based image attack (Longfei Wu

et al.

, 2014).

Figure 2.7 Format-based images (LK Tan, 2006).

Figure 2.8 Decision tree working scenario.

Figure 2.9 Modified ELM-LPG working mechanism (Zaher Yaseen

et al.

2017).

Figure 2.10 General diagram.

Figure 2.11 Proposed advanced LBPSOSA for image forgery detection.

Figure 2.12 Proposed flow of Local Binary Pattern Second-Order Statistics Algo...

Figure 2.13 LBPSOSA different features for ELM classification accuracy predict...

Figure 2.14 Forgery localization.

Figure 2.15 Feature selection methods.

Figure 2.16 BAT optimized CNN-ELM image forgery localizer.

Figure 2.17 BAT optimized CNN-ELM for image forgery predictor.

Chapter 3

Figure 3.1 Different forms of multimedia.

Figure 3.2 Data visualization method.

Figure 3.3 Types of machine learning.

Figure 3.4 Hierarchical learning.

Figure 3.5 Data clustering.

Figure 3.6 K-Prototype method.

Figure 3.7 Variation in lung X-rays in different situations [35].

Chapter 4

Figure 4.1 Vehicle data in IoT layers.

Figure 4.2 CAN bus connection.

Figure 4.3 Stage 1 of ACO.

Figure 4.4 Stage 2 of ACO.

Figure 4.5 Stage 3 of ACO.

Figure 4.6 Stage 4 of ACO.

Figure 4.7 ACO process.

Figure 4.8 Accuracy.

Figure 4.9 Sensitivity.

Figure 4.10 Specificity.

Figure 4.11 Graphical representations for time consumption.

Chapter 5

Figure 5.1 Supervised learning.

Figure 5.2 Semi-supervised learning.

Figure 5.3 Unsupervised learning.

Figure 5.4 Reinforcement learning.

Figure 5.5 Deep learning algorithms.

Figure 5.6 Agriculture green development.

Figure 5.7 ML in agriculture (pre-production phase).

Figure 5.8 ML in agriculture (production phase).

Chapter 6

Figure 6.1 Proposed encryption/decryption methodology for medical images.

Figure 6.2 (a) input DICOM CT image (D1), (b) Haar wavelet transform output, (...

Figure 6.3 (a) input DICOM CT image (D1), (b) permutation and substitution out...

Figure 6.4 First column depicts the DICOM CT input images, second column depic...

Figure 6.5 NPCR values of the encryption algorithms.

Figure 6.6 UACI values of encryption algorithms.

Figure 6.7 PSNR values of encryption algorithms.

Figure 6.8 Entropy values of plan and cipher image of encryption algorithms.

Chapter 7

Figure 7.1 Movie and video recommender systems.

Chapter 8

Figure 8.1 Haptic rendering pipeline.

Figure 8.2 Surface convolution.

Figure 8.3 Components of haptic rendering algorithm.

Figure 8.4 Algorithm used for tracing projection.

Figure 8.5 Hooke’s Law.

Figure 8.6 Thrust and torque prediction in glenoid reaming.

Figure 8.7 Tooth’s burring cross section. Dental instruments are necessary for...

Figure 8.8 Hardware and software simulation configuration.

Chapter 9

Figure 9.1 A typical educational environment based on multimedia. [1]

Chapter 10

Figure 10.1 Evaluation strategy for band selection methods used for dimensiona...

Figure 10.2 Workflow delineating the proposed approach for the computation of ...

Figure 10.3 Classification accuracy (Kappa coefficient) for the different vari...

Figure 10.4 Mean Kappa Coefficient for the different variants of mutual inform...

Figure 10.5 Classification accuracy (Kappa coefficient) for the different vari...

Figure 10.6 Mean Kappa coefficient for the different variants of mutual inform...

Figure 10.7 Mean classification accuracy for fixed training at 20% samples and...

Figure 10.8 Mean Kappa coefficient for the two cases and their average excludi...

Chapter 11

Figure 11.1 An overview of all the three classifiers.

Figure 11.2 Output of the Python implementation.

Figure 11.3 Confusion table.

Figure 11.4 Example for the confusion matrix.

Figure 11.5 Example for the confusion matrix.

Figure 11.6 Confusion matrix.

Figure 11.7 Confusion matrix.

Figure 11.8 Confusion matrix.

Figure 11.9 Confusion matrix.

Chapter 12

Figure 12.1 Hand geometry [35].

Figure 12.2 A typical hand-shape biometric system.

Figure 12.3 (a) standard face recognition procedure, (b) the process of face r...

Guide

Cover Page

Table of Contents

Series Page

Title Page

Copyright Page

Dedication

Book Description

List of Figures

List of Tables

Preface

Begin Reading

List of Authors

Index

Also of Interest

WILEY END USER LICENSE AGREEMENT

Pages

ii

iii

iv

v

vi

xiii

xiv

xv

xvi

xvii

xviii

xix

xx

xxi

xxii

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

301

302

303

305

306

307

308

Scrivener Publishing100 Cummings Center, Suite 541JBeverly, MA 01915-6106

Advances in Data Engineering and Machine Learning

Series Editor: Niranjanamurthy M, PhD, Juanying XIE, PhD, and Ramiz Aliguliyev, PhD

Scope: Data engineering is the aspect of data science that focuses on practical applications of data collection and analysis. For all the work that data scientists do to answer questions using large sets of information, there have to be mechanisms for collecting and validating that information. Data engineers are responsible for finding trends in data sets and developing algorithms to help make raw data more useful to the enterprise.

It is important to have business goals in line when working with data, especially for companies that handle large and complex datasets and databases. Data Engineering Contains DevOps, Data Science, and Machine Learning Engineering. DevOps (development and operations) is an enterprise software development phrase used to mean a type of agile relationship between development and IT operations. The goal of DevOps is to change and improve the relationship by advocating better communication and collaboration between these two business units. Data science is the study of data. It involves developing methods of recording, storing, and analyzing data to effectively extract useful information. The goal of data science is to gain insights and knowledge from any type of data — both structured and unstructured.

Machine learning engineers are sophisticated programmers who develop machines and systems that can learn and apply knowledge without specific direction. Machine learning engineering is the process of using software engineering principles, and analytical and data science knowledge, and combining both of those in order to take an ML model that’s created and making it available for use by the product or the consumers. “Advances in Data Engineering and Machine Learning Engineering” will reach a wide audience including data scientists, engineers, industry, researchers and students working in the field of Data Engineering and Machine Learning Engineering.

Publishers at ScrivenerMartin Scrivener ([email protected])Phillip Carmical ([email protected])

Supervised and Unsupervised Data Engineering for Multimedia Data

Edited by

Suman Kumar Swarnkar

J P Patra

Sapna Singh Kshatri

Yogesh Kumar Rathore

and

Tien Anh Tran

This edition first published 2024 by John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA and Scrivener Publishing LLC, 100 Cummings Center, Suite 541J, Beverly, MA 01915, USA© 2024 Scrivener Publishing LLCFor more information about Scrivener publications please visit www.scrivenerpublishing.com.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.

Wiley Global Headquarters111 River Street, Hoboken, NJ 07030, USA

For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.

Limit of Liability/Disclaimer of WarrantyWhile the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials, or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read.

Library of Congress Cataloging-in-Publication Data

ISBN 978-1-119-78634-4

Front cover images created with Adobe FireflyCover design by Russell Richardson

Dedication

To everyone who made this book possible, I recognize your efforts from the depth of my heart. My Parents, my Wife, my Son, Colleagues of the Computer Science and Engineering Department, the Institution head and the faculty members of Shri Shankaracharya Institute of Professional Management and Technology, Raipurwithout you people this book wouldn’t have been possible. I dedicate this book to all of you.

Dr. Suman Kumar Swarnkar

To everyone who made this book possible, I recognize your efforts from the depth of my heart. My Parents, my Wife Sumitra, my Son Yuvraj, Colleagues of the Computer Science and Engineering Department, the Institution head and the faculty members of Shri Shankaracharya Institute of Professional Management and Technology, Raipurwithout you people this book wouldn’t have been possible. I dedicate this book to all of you.

Dr. J P Patra

I would like to express our sincere gratitude to everyone who made this book possible. My Father Late S.L. Rathore, my Mother, my Wife Pooja, my Son Shivank, my Daughter Priyanshi, all my family members, Colleagues of the Department of Computer Science and Engineering and management of Shri Shankaracharya Institute of Professional Management and Technology, Raipur for their support and timely advice. I gladly dedicate this book to all of you.

Mr. Yogesh Kumar Rathore

Book Description

In the ever-evolving age of technology, Artificial Intelligence (AI) and Multimedia Data Engineering have become increasingly important tools for understanding and manipulating data. As AI and multimedia data engineering work together to create new technologies that can help us in our daily lives, it is essential to understand how these concepts interact with each other. This article will provide an overview of Artificial Intelligence and Multimedia Data Engineering, as well as their implications on modern society. Recent advances in AI have been aided by the development of multimedia data engineering techniques, which allow us to collect, store, analyze and visualize large amounts of information. By combining these two fields together we can gain a better understanding of how they interact with each other. The ability to extract meaningful insights from various types of datasets is becoming increasingly important in order to make decisions based on accurate data-driven analysis.

List of Figures

Figure 1.1 Basic sign language for each alphabet known characters

Figure 1.2 Block diagram of phases of sign language recognition

Figure 1.3 A few samples of MNIST sign language dataset

Figure 1.4 Initial vectorization of data

Figure 1.5 Final vectorization of data

Figure 1.6 Phases of binary class conversion

Figure 1.7 Sequential model with added layers

Figure 1.8 Image processing techniques and steps

Figure 1.9 A basic convolution for feature learning and classification

Figure 1.10 Vectorized data outcome

Figure 2.1 Copy move forgery attack

Figure 2.2 Photomontage attack

Figure 2.3 Resizing attack

Figure 2.4 Image splicing attack

Figure 2.5 Colorized image attack

Figure 2.6 Camera-based image attack

Figure 2.7 Format-based images

Figure 2.8 Decision tree working scenario

Figure 2.9 Modified ELM-LPG working mechanism

Figure 2.10 General diagram

Figure 2.11 Proposed advanced LBPSOSA for image forgery detection

Figure 2.12 Proposed flow of Local Binary Pattern Second-Order Statistics Algorithm (LBPSOSA) for Image Forgery Detection

Figure 2.13 LBPSOSA different features for ELM classification accuracy prediction

Figure 2.14 Forgery localization

Figure 2.15 Feature selection methods

Figure 2.16 BAT optimized CNN-ELM image forgery localizer

Figure 2.17 BAT optimized CNN-ELM for image forgery predictor

Figure 3.1 Different forms of multimedia

Figure 3.2 Data visualization method

Figure 3.3 Types of machine learning

Figure 3.4 Hierarchical learning

Figure 3.5 Data clustering

Figure 3.6 K-Prototype method

Figure 3.7 Variation in lungs X-rays in different situations

Figure 4.1 Vehicle data in IoT layers

Figure 4.2 CAN bus connection

Figure 4.3 Stage 1 of ACO

Figure 4.4 Stage 2 of ACO

Figure 4.5 Stage 3 of ACO

Figure 4.6 Stage 4 of ACO

Figure 4.7 ACO process

Figure 4.8 Accuracy

Figure 4.9 Sensitivity

Figure 4.10 Specificity

Figure 4.11 Graphical representations for time consumption

Figure 5.1 Supervised learning

Figure 5.2 Semi-supervised learning

Figure 5.3 Unsupervised learning

Figure 5.4 Reinforcement learning

Figure 5.5 Deep learning algorithms

Figure 5.6 Agriculture green development

Figure 5.7 ML in agriculture (pre-production phase)

Figure 5.8 ML in agriculture (production phase)

Figure 6.1 Proposed encryption/decryption methodology for medical images

Figure 6.2 (a) input DICOM CT image (D1), (b) Haar wavelet transform output, (c) image after permutation and diffusion, (d) encrypted image, (e) decrypted image based on wavelet transform technique

Figure 6.3 (a) input DICOM CT image (D1), (b) permutation and substitution output by 2D-Tent Cascade Logistic Map algorithm, (c) encrypted output, (d) decrypted image based on 2D-Tent Cascade Logistic Map algorithm

Figure 6.4 First column depicts the DICOM CT input images, second column depicts the decrypted images using wavelet transform algorithm, third column depicts the decrypted images using 2D-Tent Cascade Logistic Map algorithm

Figure 6.5 NPCR values of the encryption algorithms

Figure 6.6 UACI values of encryption algorithms

Figure 6.7 PSNR values of encryption algorithms

Figure 6.8 Entropy values of plan and cipher image of encryption algorithms

Figure 7.1 Movie and video recommender systems

Figure 8.1 Haptic rendering pipeline

Figure 8.2 Surface convolution

Figure 8.3 Components of haptic rendering algorithm

Figure 8.4 Algorithm used for tracing projection

Figure 8.5 Hooke’s Law

Figure 8.6 Thrust and torque prediction in glenoid reaming

Figure 8.7 Tooth’s burring cross section. Dental instruments are necessary for numerous dental procedures and tooth health. Dentists use the dental mirror to see inside the mouth and the probe to identify cavities and problems on the tooth’s surface. Plaque and tartar are removed by the scaler, improving oral health. Dental drill instruments vary by task, such as cavity preparation. Teeth are held and removed with forceps. Thin dental probes detect gum pocket depth to assess mouth health. A and b represent the tooth’s surface structure, which includes cuspids, incisors, and other elements that form and function it. The tooth’s complicated geometry makes it worthwhile in various oral functions. These dental tools help dentists diagnose, treat, and maintain oral health. Dental instruments are necessary for numerous dental procedures and tooth health. Dentists use the dental mirror to see inside the mouth and the probe to identify cavities and problems on the tooth’s surface. Plaque and tartar are removed by the scaler, improving oral health. Dental drill instruments vary by task, such as cavity preparation. Teeth are held and removed with forceps. Thin dental probes detect gum pocket depth to assess mouth health. A and b represent the tooth’s surface structure, which includes cuspids, incisors, and other elements that form and function it. The tooth’s complicated geometry makes it worthwhile in various oral functions. These dental tools help dentists diagnose, treat, and maintain oral health.

Figure 8.8 Hardware and software simulation configuration

Figure 9.1 A typical educational environment based on multimedia

Figure 10.1 Evaluation strategy for band selection methods used for dimensionality reduction of hyperspectral data

Figure 10.2 Workflow delineating the proposed approach for the computation of the normalized mutual information and the adjusted mutual information

Figure 10.3 Classification accuracy (Kappa coefficient) for the different variants of mutual information with respect to the different number of bands for the (a) Indian Pines dataset, (b) Dhundi dataset, (c) Pavia University dataset, (d) Salinas dataset, for 20% random training samples using the Random Forest classifier

Figure 10.4 Mean Kappa Coefficient for the different variants of mutual information for the different number of bands for the (a) Indian Pines dataset, (b) Dhundi dataset, (c) Pavia University dataset, (d) Salinas dataset

Figure 10.5 Classification accuracy (Kappa coefficient) for the different variants of mutual information with respect to the different volume of training samples for the (a) Indian Pines, (b) Dhundi dataset, (c) Pavia University dataset, (d) Salinas dataset, for 20 selected best bands based on the Random Forest classifier

Figure 10.6 Mean Kappa coefficient for the different variants of mutual information for the different volume of training data for the (a) Indian Pines dataset, (b) Dhundi dataset, (c) Pavia University dataset, (d) Salinas dataset

Figure 10.7 Mean classification accuracy for fixed training at 20% samples and 20 selected bands over the four test datasets for each of the MI variants in (a) and (b)

Figure 10.8 Mean Kappa coefficient for the two cases and their average excluding the Indian Pines dataset

Figure 11.1 An overview of all the three classifiers

Figure 11.2 Output of the Python implementation

Figure 11.3 Confusion table

Figure 11.4 Example for the confusion matrix

Figure 11.5 Example for the confusion matrix

Figure 11.6 Confusion matrix

Figure 11.7 Confusion matrix

Figure 11.8 Confusion matrix

Figure 11.9 Confusion matrix

Figure 12.1 Hand geometry

Figure 12.2 A typical hand-shape biometric system

Figure 12.3 (a) standard face recognition procedure, (b) the process of face recognition

List of Tables

Table 1.1 Accuracy and loss values per epochs

Table 1.2 Experimental results of training and testing data for accuracy and loss

Table 2.1 Different ML classifiers

Table 2.2 Modified LBP variants (Seven wonders of LBP) and second-order statistical feature extraction, GLRLM algorithm

Table 2.3 Schema of 1-5 database

Table 2.4 Image forgery detection & recognition (original &forged)

Table 2.5 Accuracy for different methods

Table 2.6 Accuracy for different methods

Table 4.1 Accuracy

Table 4.2 Sensitivity

Table 4.3 Specificity

Table 4.4 Table of time consumption

Table 7.1 List of papers and their summary on CNN-based recommender system

Table 7.2 Statistics of MovieLense 10M dataset

Table 7.3 Different fusions functions performances with late fusion model

Table 7.4 Multi-user interest performance analysis

Table 7.5 Performance comparison with different deep learning model

Table 8.1 Standard deviation of running time in different resolutions

Table 10.1 Summary of the test datasets including the Indian Pines, Salinas, Dhundi and the Pavia University

Table 10.2 The different types of NMI and corresponding AMI variants according to Vinh

et al

. [44]

Table 10.3 Confusion matrix

Table 10.4 Kappa coefficient values for the two cases used in strategic evaluation of the potential of the NMI/AMI variants and the proposed weighted NMI and weighted AMI for hyperspectral band selection

Table 12.1 Security perspective, properties, data sets, and success criteria comparison of used machine learning techniques

Preface

Artificial intelligence (AI) is a rapidly growing field of engineering that has the potential to revolutionize the way we interact with machines, process data, and even see our world. Multimedia Data Engineering (MDE) is an important branch of AI which focuses on how machine learning algorithms can be used to analyze and interpret large amounts of multimedia data. With this article, we will explore how AI technologies are utilized in MDE and the benefits they bring to professionals working in this domain.

At its core, MDE combines AI techniques with traditional computer science principles to make sense of vast amounts of multimedia data. By leveraging advances such as facial recognition technology, natural language processing tools, text-to-speech applications and more, engineers are able to transform unstructured data into valuable insights for businesses.

The research papers of this issue are broadly classified into current computing techniques, Artificial intelligence and Multimedia Data Engineering and implementation.

The editor thanks all the reviewers for their excellent contributions to this issue. I sincerely hope that you will enjoy reading these papers, and we expect them to play an important role in promoting advanced computing techniques and implementation research. I hope that this issue will prove a great success with the exchange of ideas, which will forester future research collaborations.

Dr. Suman Kumar Swarnkar

Department of Computer Science and Engineering Shri Shankaracharya Institute of Professional Management and Technology Raipur, Chhattisgarh, India

Dr. J P Patra

Department of Computer Science and Engineering Shri Shankaracharya Institute of Professional Management and Technology Raipur, Chhattisgarh, India

Dr. Sapna Singh Kshatri

Department of Computer Science and Engineering Shri Shankaracharya Institute of Professional Management and Technology Raipur, Chhattisgarh, India

Yogesh Kumar Rathore

Department of Computer Science and Engineering, Shri Shankaracharya Institute of Professional Management and Technology Raipur, Chhattisgarh, India

Dr. Tien Anh Tran

Vietnam Maritime University Haiphong, Vietnam

1SLRRT: Sign Language Recognition in Real Time

Monika Lamba1* and Geetika Munjal2

1Department of Computer Science and Engineering (CSE), The NorthCap University, Gurugram, India

2Amity School of Engineering and Technology, Amity University, Noida, Uttar Pradesh, India

Abstract

An application called Sign Language Recognition (SLR) can recognise a variety of non-identical letter movements and translate them into text. In the area of science and technology, this application is extremely significant. It can be used in a variety of machine learning-based applications, including virtual reality. The purpose of the chapter is to develop a convolutional neural network that will recognise the signs captured or focused from the video capture and in turn provide us with correct or accurate output based on text and to improve the accuracy of the real-time sign language recognition via scanning and detecting that would aid other physically challenged individuals. For all individuals who want assistance in communicating with the rest of society, it offers an offline application. In order to produce quick, precise results and to ensure that the material isn’t lost during the evaluation process, it tries to evaluate gestures more efficiently. Real-time sign language recognition involves first identifying images from a video feed that has been acquired using a machine learning model, then identifying edges and vertices, and then determining the desired result using a convolutional neural network. This method will be carried out at runtime to obtain results continuously while creating sign language with very little wait time utilising the CNN model. Character identification will be easier with this approach, and sentences can be constructed with high levels of accuracy using fewer letters.

Keywords: Language recognition, real time, sign, convolutional neural network, machine learning

1.1 Introduction

Nowadays, technology has taken an advanced leap forward in terms of improvement and efficiency. One of the many technologies that have taken such steps is Real Time Sign Language Recognition. Sign language recognition is an application to detect various gestures of different characters and convert them into text. This application has a huge importance in the field of science and technology. It has different applications based on machine learning and even in virtual reality. There are various types of sign languages such as ISL (Indian Sign Language) [1], BSL (British Sign Language) [2], ASL (American Sign Language) [3] and many more implemented differently at different parts of the world. Our aim is to apply American Sign language for sign to text recognition [3] [4] [5]. The American sign language is similar to other normal languages in that it can be expressed using gestures like hand or body movements. Although it shares many characteristics with other languages, it does not have English-like grammar. It is the most widely used sign language on earth. It is primarily used in nations like America, Africa, and much of southeast Asia. American sign language serves as a link between the deaf and hearing communities. They can textually describe their actions with the aid of this programme. This type of work has also been done in the past, with each instance producing unique outcomes and approaches, although few of them meet the standards for excellence.

The overall expansion of this language has been aided by its use in places like schools, hospitals, police stations, and other learning facilities. Since it is widely regarded as being simple to comprehend and fully independent of context, some people even choose to talk using this language. There are instances where newborn infants will receive this language from their mothers as their mother tongue. In fact, this is how sign language is meant to be understood. Figure 1.1 shows a visual representation of alphabets as signs.

Structure, grammar, and gestures are typically where sign languages diverge from one another. Unlike other sign languages, American Sign Language has a single-headed finger spelling alphabet. Compared to others, it is simpler to implement and interpret. The motions were also developed with consideration for various cultural traditions. Because people are accustomed to these gestures throughout their lives, this in turn draws a larger audience. The two-handed nature of BSL communication makes it difficult for non-BSL users to comprehend and interpret the language [5].

Figure 1.1 Basic sign language for each alphabet known characters.

The ISL is a well-known sign language in India; yet, because there are fewer studies and sources for accurate translations and because ASL has a larger audience, many individuals prefer ASL to other sign languages [6]. The ISL also has numerous identical motions with different meanings, which can be confusing when translated, even though all of these languages take roughly the same amount of time to translate letters and words. We needed ASL as the sign language converter because it is a more widely spoken language than English [7] [8].

The most fundamental need in society is for effective communication. Deaf and dumb people struggle greatly every day to communicate with regular people. Because those who are deaf or mute need to have their proper place in society, such an application was desperately needed. They experience secondary issues like loneliness and despair as a result of their basic weakness; thus it would be preferable if they could integrate more socially and forge more social ties [9] [10].

People also frequently offer alternative solutions, one of which is, “Instead of utilising another language to communicate, why don’t deaf people just write it down and display it instead?” This explanation may appear reasonable and enticing from the perspective of a person without this disability, but the people who are experiencing these challenges require human solutions to their problems. These people need to express their feelings and activities, which cannot be done solely through writing. Consequently, that is still another justification for our decision to make a contribution to the field of sign language [11].

The concept of delivering results in written form primarily enables us to communicate with those who lack the opportunity to talk or hear. A little ease in their life would be given to all the dumb or deaf people thanks to such an application. The happier these people will be sharing such a larger platform, the more such applications will be created and technology is enhanced.

1.2 Literature Survey

Technologies like speech, gesture, and hand are significant piece of HCI (human computer interaction) [12]. Gesture recognition has numerous applications such as sign language, robot control, and virtual reality. In the proposed method of Zhi-hua Chen [13], hand recognition is grounded on finger recognition and hence, it is more effective and uses a simpler rule classifier. The rule classifier used in real-time applications is highly efficient. The author used a simple camera to notice hand gesture rather than using data glove and special tape which are much more expensive. It includes fingers, hand detection, palm segmentation and hand gesture recognition. In the very first step, i.e., hand detection, the colour of the skin is measured using HSV model and the image is resized to 200 x 200. The output of this step generates a binary image in which white pixels represent the hand and black pixel represent the background. The next step is the segmentation of palm and fingers which is obtained with the help of palm point (center of pam), wrist line and wrist point. The labelling algorithm is applied to detect regions of fungus. Finally, hand gesture is recognized by counting fingers and identifying what figure. The dataset of 1,300 images is used to prove highly accurate results. The system takes 0.024 seconds to recognize a hand [13]. Zeenat [14] studied gesture as basically a form of non-verbal communication that involves gestures by which people communicate with each other. People can’t communicate without gesture. They are the mode of communication which people use to communicate. Cooperation between people comes from various tactile modes like signal, discourse, facial and body articulations. The principle preferred position of utilizing hand signals is to connect with computer as an on-contact human computer input methodology. Hand gesture has eliminated the use of controlling of movement of virtual objects. One of the most broadly utilized examples for hand gesture recognition is data glove. Use of gesture recognition has also eliminated the use of data glove due to the expensive cost of gloves. There are three stages of gesture recognition: 1. Image pre-processing 2. Tracking 3. Recognition. The system developed was to capture the hand gesture in front of a web camera, which in turn would take a picture and then continue to recognize the reasonable motion through a specific algorithm. This paper fundamentally includes investigation and distinguishing proof of hand signals to perform suitable activities. Picture preparing is fundamentally an examination of digitized picture so as to improve its quality. EMGO-CV is fundamentally utilized for picture preparing. Emgu CV is a cross stage. Net wrapper to the Intel Open CV picture handling library. Permitting OpenCV capacities to be called from .NET perfect dialects, for example, C#, VB, VC++, Iron Python. The creator utilizes various procedures to discover number of fingers present close by signal.

Nayana presented a procedure for human computer communication exploiting open source like Python OpenCV. The proposed calculation comprises pre-handling, division and highlight extraction. Highlights include snapshots of the picture, centroid of the picture and Euclidean separation. The hand signal pictures are taken by a camera. The role of hand gestures is very important in day-to-day life. They basically convey expressive meanings by which people communicate with each other. This model presented a hand signal acknowledgment framework which utilizes just hand motions to speak. The calculation is partitioned into three sections: pre-handling, division and highlight extraction. The study utilizes forms, raised body and convexity deformities to discover the hand signal. In the course of the most recent couple of years a few explores are directed close by motion acknowledgment utilizing OpenCV and a few exhibitions correlations are led to improve the system. Picture changes are finished on the RGB picture to change over into YCBCR picture. The YCBCR picture changed into parallel picture. This computation needs uniform and plane establishment. OpenCV (Open-Source Computer Vision Library) is a library which mostly centres at continuous computer vision. OpenCV was structured for computational proficiency and with a solid spotlight on applications. It gives essential information structures for picture handling with productive improvements. Python is an object-oriented approach. For the implementation, Hand segmentation algorithm is used. In this calculation, hand division is utilized to extricate the hand picture from the foundation. There are a few strategies for division. The significant advance in division is change and thresholding. In this calculation, the BGR picture taken by a camera is considered as contribution to the calculation. The BGR picture is changed into a dark scale picture. The dark scale picture is obscured to get the definite limit. The obscured picture is edge to the specific worth. The creator introduced a procedure to locate the quantity of fingers present in the hand signal [15] [16].

Hand gesture recognition [17] is basically used for identifying the shapes or orientation depending on the feasibility of performing the task. Gestures are mainly used for conveying meaningful messages. They are the most important part of human life. Data gathering is the author’s first action. The first step is to take a picture with the camera and identify a region of interest in the frame. This is important since the picture may contain a number of aspects that could lead to unfavourable outcomes and significantly reduce the amount of information that has to be prepared. A webcam is used to take the photo, which continuously records outline data and is used to gather basic planning information. Data pre-processing, which is done in two steps and comprises segmentation and morphological filtering, is the next phase. In order to have just two areas of interest in a photograph, segmentation is used to convert a dark-scale image into a twofold image. In other words, one will be a hand, and the other a foundation. For this process, calculations known as “Otsu calculations” can be used. Dark-scale images are transformed into two-dimensional images with the area of interest serving as the hand and foundation. In order to ensure that there is no noise in the image, morphological filtering is used. Dilation, Erosion, Opening, and Closing are the basic filtering methods that can be used to assure that there is no noise. There is also a possibility of errors which can be termed as gesture noise [18].

Utilizing hand motions is one of the most regular ways of collaborating with the computer, and in particular right translation of moving hand signals progressively has numerous applications. In his paper, Nuwan Munasinghe [19] has planned and built up a framework which can perceive motions before a web camera ongoing utilizing movement history pictures (MHI) and feedforward neural systems. With the introduction of new technologies, new methods of interaction with computers have been introduced. Old methods were keyboards, mouse, joysticks and data gloves. Gesture recognition has been a commonly used method for interacting with computers and it also provides good interface for human computer interaction. It also has a lot of applications like sign language recognition, gameplay, etc. Gestures are non-verbal means of communication which are used to convey meaningful message. They can be static and dynamic. In this dynamic gesture recognition has been performed. Normally gesture recognition can be split into two parts: first, vision-based, and second, one that considers the use of keyboards, mouse, etc. The vision-based makes use of pattern recognition, image processing etc. In vision-based methodologies, a signal acknowledgment robot control framework has been created and hand presents and faces are distinguished utilizing various component-based layout coordinating systems, and to accomplish this, analysts have utilized skin shading–based division technique. Motions have perceived utilizing a standard-based framework where distinguished skin-like locales are coordinated with predefined motions. Feedforward neural network uses the concept of static hand gesture recognition to establish 10 different types of static gestures. There are a few algorithms which have been used for real-time gesture recognition using k-nearest neighbour and decision tree. The primary concern is how the computer vision-based methods and feed-forward neural systems-based grouping strategies have been utilized to build up an ongoing unique signal acknowledgment framework. In this paper, the author has basically made use of vision-based and neural network based real time-gesture recognition system.

In Ali [20] a steady vision-based structure is proposed to screen objects (hand fingers). It is built subject to the Raspberry Pi with camera module and changed with Python programming Language maintained by Open-Source PC Vision (OpenCV) library. The Raspberry Pi inserts with an image dealing with figuring called hand movement, which screens a thing (hand fingers) with its isolated features. The fundamental point of hand signal acknowledgment framework is to set up a correspondence between human and electronic frameworks for control. The perceived signals are utilized to control the movement of a portable robot progressively. The portable robot is constructed and tried to demonstrate the viability of the proposed calculation. The robot movement and route happy with various headings: Forward, Backward, Right, Left and Stop. Vision-based and picture handling frameworks have different applications in design acknowledgment and moving robots’ route. Raspberry Pi is a little measured PC load up [20] reasonable for real-time ventures. The fundamental reason for the work exhibited in this paper is to make a framework equipped for distinguishing and checking a few highlights for objects that are predetermined by a picture handling calculation utilizing Raspberry Pi and camera module. The element extraction calculation is modified with Python upheld by OpenCV libraries, and executed with the Raspberry Pi connected with an outer camera. In this paper, it is displayed convenient robot using Raspberry Pi, where its advancement is constrained by methods for the camera related with Raspberry Pi that forward headings direct to the driver of a two-wheel drive portable meanderer. It used hand signal computation to recognize the article (hand) and control the improvement of the robot. Moreover, it made this robot work with living circumstance poor brightening condition. Software used for the implementation of the system is Raspbian OS which is developed for Raspberry Pi. Python and OpenCV are used also. Python, as we already know, is a very high–level programming language with fewer lines of code. It is simple and easy to execute. It has an extensive number of libraries. OpenCV is a free library that joins a few APIs for PC dreams used in picture taking care of to propel an authentic time application. There are a couple of features in OpenCV which support data dealing with, including: object area, camera arrangement, 3D multiplication and interface to video planning. Python programming language was utilized to manufacture the hand motion acknowledgment framework.

David [21] proposed a method that detected two hands simultaneously using techniques like border detection and filters. The application is divided into two parts: Robot and GPS use. In the Robot, the hand gestures are used to control the robot and in GPS, the GPS is controlled using gestures. The data of 600 gestures is used which is performed by 60 users. The application gave 93.1% accuracy and successively detected hand gesture. The least detected hand gestures are showing one figure but still is 75% accurate.

VivekBhed [22] displays the Sign language which is a kind of correspondence that regularly goes on understudied. Where the interpretation procedure among signs and communicated in or composed language is officially called understanding it assumes the job which is equivalent to the interpretation for the communicated in language. Nowadays, the usage of depth-sensing technology is growing in popularity; the Custom Designed Colour Gloves makes the feature extraction much more efficient. Though the depth-sensing technology is not used for automatic sign language recognition, there have also been successful attempts at using CNNs to handle task of classifying images of ASL letter gesture. The general design was a reasonably CNN engineering; it has various convolutional and thick layers. The information incorporates an assortment of 25 pictures from 5 individuals for every letter set and digits 1-9, a pipeline was built up that can be utilized so individuals can add pictures to this dataset. The performances will be improved and observed in the Data Augmentation process. A Deep Learning approach for a classification of ASL, the method used here shows potential in solving the problem using a simple camera which is easy to access, also bringing out the huge difference in performance of algorithms.

Hand gestures are also known as sign language; it is a language basically used by deaf people and by people who are unable to speak. There is a process known as the hand gesture recognition process focused on the recognition of meaningful expression of form and motion by the involvement of only the hands. Hand gestures recognition is applied in plenty of the applications for the purpose of accessibility, communication and learning.

This paper includes information about different experiments conducted on different types of the convolutional neural network, and it is evaluated on marcel dataset. Gjorgji [23] presented an approach, i.e., mainly divided into Data-glove-based approach which collects the information from the sensor attached to the glove and mounted on the hand of the user. The approach describes the artificially visual field to complement biological human vision. Hand motions are a basic part in human-to-human correspondence. The effectiveness of data move utilizing this procedure of correspondence is remarkable; thusly it has started thoughts for usage in the region of human-PC collaboration. For this to be conceivable the PC needs to perceive the motion appeared to it by the individual controlling it.

Various individuals and background were utilized so as to build assorted variety and data contained inside the dataset. Since the profound models that we prepared in our analyses require an enormous mass of information to prepare appropriately, we utilized information expansion on the pictures in the dataset. This was done so as to pick up amount while as yet presenting some curiosity as far as data to our dataset. GoogLeNet is a profound convolutional neural system structured by Google highlighting their prevalent Inception engineering [24].

Gesture-based communications, which comprise a blend of hand developments and outward appearances, are utilized by hard-of-hearing people the world over to communicate. Be that as it may, hearing people once in a while know gesture-based communications, making obstructions to consideration.

The expanding progress of portable innovation, alongside new types of client collaboration, opens up potential outcomes for overcoming such obstructions, especially using signal acknowledgment through cell phones. This literature review discusses works from 2009 to 2017 that present answers for motion acknowledgment in a versatile setting just as facial acknowledgment in gesture-based communications. Among an assorted variety of equipment and methods, sensor-based gloves were the most utilized extraordinary equipment, alongside animal power correlation with order motions.

1.3 Model for Sign Recognition Language

The main ideas of the content up to this point have been what sign language is and why American sign language was chosen above other sign languages. What compelled us to research this topic even more? Now the question is, how are we going to accomplish such a task? To answer that, we must be able to comprehend the idea that this research is trying to portray as well as the procedures or strategies that will be employed to carry out this research sequentially.

Figure 1.2 Block diagram of phases of sign language recognition.

The fundamental assumption of this chapter is that whenever people need to communicate with each other, especially the deaf or dumb, they cannot comprehend each other, even persons with no physical issues with deaf or dumb people. By bridging that communication gap, this application will help.

Consequently, how do these sign languages function? Unlike commonly spoken languages, signs occasionally express their meaning by hand, facial, or body motions. Given that grammar is different from spoken language, this also applies to the way it is presented. Non-manual action is the term used to describe the act of communicating with sign languages.

The process shown in Figure 1.2 shows how to guarantee accurate and correct input recognition. Each phase is broken down into numerous substeps, all of which will be covered in detail in this chapter.

1.4 Experimentation

Real-time sign language recognition is a challenging application; therefore, system requirements must be kept in mind. Such research can typically be implemented using both low- and high-resolution cameras as well as more advanced systems. In order to ensure that neural network implementation is still effective even with low-resolution photos, this research will capture the primary input from the webcam of the laptop.

Python will be needed for this research’s programming language, and it will also need tools like PyCharm or Jupiter Notebook [25] for the research’s internal operations. For the training dataset, there will be at least 200 data sets for each character. Additionally, this research will apply effective machine language algorithms and aim to raise the baseline accuracy level.

The research will make use of well-known platforms and libraries to compute data and present results in a certain format, such as:

TensorFlow – It is an open-source platform for developing and building applications based on machine learning. It contains all the tools and libraries for the development of machine learning–powered applications.

Keras – It was founded to provide ease in deep neural network applications. It is mainly an application programming interface which can be performed using Python programming language. It is mostly helpful in back-end working.

OpenCV – It is a library which contains the material to operate real-time applications. It can be used with Python language. It is mainly for front end purposes.

Libraries such as NumPy or Os are also used to calculate mathematical procedures as well as file reading and writing executions.

Dataset

Humans have been accustomed to using two dimensions, yet occasionally three dimensions are employed as well, much like throughout evolution. What if, however, there are n dimensions to consider? When seemingly straightforward situations become completely unmanageable for human interactions, machine learning becomes useful.

We have a really helpful dataset that is directly related to computer vision and is used for sign language recognition. This data collection was created by MNIST:

M-Modified

N-National

I -Institute

S-Standards

T-Technology

The MNIST dataset was produced using Sign language in all of its possible manifestations [26]. A few samples of MNIST are mentioned in Figure 1.3. The data set’s signs have size of 200*200 pixels on both the horizontal and vertical axes. The sequences and each element of the dataset have been numerically labelled according to the class to which they belong.

Figure 1.3 A few samples of MNIST sign language dataset.

Data Vectorization