146,99 €
SUPERVISED and UNSUPERVISED DATA ENGINEERING for MULTIMEDIA DATA Explore the cutting-edge realms of data engineering in multimedia with Supervised and Unsupervised Data Engineering for Multimedia Data, where expert contributors delve into innovative methodologies, offering invaluable insights to empower both novices and seasoned professionals in mastering the art of manipulating multimedia data with precision and efficiency. Supervised and Unsupervised Data Engineering for Multimedia Data presents a groundbreaking exploration into the intricacies of handling multimedia data through the lenses of both supervised and unsupervised data engineering. Authored by a team of accomplished experts in the field, this comprehensive volume serves as a go-to resource for data scientists, computer scientists, and researchers seeking a profound understanding of cutting-edge methodologies. The book seamlessly integrates theoretical foundations with practical applications, offering a cohesive framework for navigating the complexities of multimedia data. Readers will delve into a spectrum of topics, including artificial intelligence, machine learning, and data analysis, all tailored to the challenges and opportunities presented by multimedia datasets. From foundational principles to advanced techniques, each chapter provides valuable insights, making this book an essential guide for academia and industry professionals alike. Whether you're a seasoned practitioner or a newcomer to the field, Supervised and Unsupervised Data Engineering for Multimedia Data illuminates the path toward mastery in manipulating and extracting meaningful insights from multimedia data in the modern age.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 457
Veröffentlichungsjahr: 2024
Cover
Table of Contents
Series Page
Title Page
Copyright Page
Dedication
Book Description
List of Figures
List of Tables
Preface
1 SLRRT: Sign Language Recognition in Real Time
1.1 Introduction
1.2 Literature Survey
1.3 Model for Sign Recognition Language
1.4 Experimentation
1.5 Methodology
1.6 Experimentation Results
1.7 Conclusion
Future Scope
References
2 Unsupervised/Supervised Feature Extraction and Feature Selection for Multimedia Data (Feature extraction with feature selection for Image Forgery Detection)
2.1 Introduction
2.2 Problem Definition
2.3 Proposed Methodology
2.4 Experimentation and Results
2.5 Feature Selection & Pre-Trained CNN Models Description
2.6 BAT ELM Optimization Results
Conclusion
Declarations
Consent for Publication
Conflict of Interest
Acknowledgement
References
3 Multimedia Data in Healthcare System
3.1 Introduction
3.2 Recent Trends in Multimedia Marketing
3.3 Challenges in Multimedia
3.4 Opportunities in Multimedia
3.5 Data Visualization in Healthcare
3.6 Machine Learning and its Types
3.7 Health Monitoring and Management System Using Machine Learning Techniques
3.8 Health Monitoring Using K-Prototype Clustering Methods
3.9 AI-Based Robotics in E-Healthcare Applications Based on Multimedia Data
3.10 Future of AI in Health Care
3.11 Emerging Trends in Multimedia Systems
3.12 Discussion
References
4 Automotive Vehicle Data Security Service in IoT Using ACO Algorithm
Introduction
Literature Survey
System Design
Result and Discussion
Conclusion
References
5 Unsupervised/Supervised Algorithms for Multimedia Data in Smart Agriculture
5.1 Introduction
5.2 Background
5.3 Applications of Machine Learning Algorithms in Agriculture
References
6 Secure Medical Image Transmission Using 2-D Tent Cascade Logistic Map
6.1 Introduction
6.2 Medical Image Encryption Using 2D Tent and Logistic Chaotic Function
6.3 Simulation Results and Discussion
6.4 Conclusion
Acknowledgement
References
7 Personalized Multi-User-Based Movie and Video Recommender System: A Deep Learning Perspective
7.1 Introduction
7.2 Literature Survey on Video and Movie Recommender Systems
7.3 Feature-Based Solutions for Movie and Video Recommender Systems
7.4 Fusing: EF – (Early Fusion) and LF – (Late Fusion)
7.5 Experimental Setup
7.6 Conclusions
References
8 Sensory Perception of Haptic Rendering in Surgical Simulation
Introduction
Methodology
Background Related Work
Application
Case Study
Future Scope
Result
Conclusion
Acknowledgement
References
9 Multimedia Data in Modern Education
Introduction to Multimedia
Traditional Learning Approaches
Applications of Multimedia in Education
Conclusion
References
10 Assessment of Adjusted and Normalized Mutual Information Variants for Band Selection in Hyperspectral Imagery
Introduction
Test Datasets
Methodology
Statistical Accuracy Investigations
Results and Discussion
Conclusion
References
11 A Python-Based Machine Learning Classification Approach for Healthcare Applications
Introduction
Methodology
Discussion
References
12 Supervised and Unsupervised Learning Techniques for Biometric Systems
Introduction
Various Biometric Techniques
Major Biometric-Based Problems from a Security Perspective
Supervised Learning Methods for Biometric System
Unsupervised Learning Methods for Biometric System
Conclusion
References
About the Editors
Index
Also of Interest
End User License Agreement
Chapter 1
Table 1.1 Accuracy and loss values per epochs.
Table 1.2 Experimental results of training and testing data for accuracy and l...
Chapter 2
Table 2.1 Different ML classifiers.
Table 2.2 Modified LBP variants (Seven wonders of LBP) and second-order statis...
Table 2.3 Schema of 1-5 database.
Table 2.4 Image forgery detection & recognition (original & forged).
Table 2.5 Accuracy for different methods.
Table 2.6 Accuracy for different methods.
Chapter 4
Table 4.1 Accuracy.
Table 4.2 Sensitivity.
Table 4.3 Specificity.
Table 4.4 Table of time consumption.
Chapter 7
Table 7.1 List of papers and their summary on CNN-based recommender system.
Table 7.2 Statistics of MovieLense 10M dataset.
Table 7.3 Different fusions functions performances with late fusion model.
Table 7.4 Multi-user interest performance analysis.
Table 7.5 Performance comparison with different deep learning model.
Chapter 8
Table 8.1 Standard deviation of running time in different resolutions.
Chapter 10
Table 10.1 Summary of the test datasets including the Indian Pines, Salinas, D...
Table 10.2 The different types of NMI and corresponding AMI variants according...
Table 10.3 Confusion matrix.
Table 10.4 Kappa coefficient values for the two cases used in strategic evalua...
Chapter 12
Table 12.1 Security perspective, properties, data sets, and success criteria c...
Chapter 1
Figure 1.1 Basic sign language for each alphabet known characters.
Figure 1.2 Block diagram of phases of sign language recognition.
Figure 1.3 A few samples of MNIST sign language dataset.
Figure 1.4 Initial vectorization of data.
Figure 1.5 Final vectorization of data.
Figure 1.6 Phases of binary class conversion.
Figure 1.7 Sequential model with added layers.
Figure 1.8 Image processing techniques and steps.
Figure 1.9 A basic convolution for feature learning and classification.
Figure 1.10 Vectorized data outcome.
Chapter 2
Figure 2.1 Copy move forgery attack (Rahul Dixit & Ruchira Naskar 2017).
Figure 2.2 Photomontage attack (Aseem Agarwala
et al.
, 2004).
Figure 2.3 Resizing attack (Wei-Ming Dong & Xiao-Peng Zhang, 2012).
Figure 2.4 Image splicing attack (Yun Q. Shi
et al.
, 2007).
Figure 2.5 Colorized image attack (Yuanfang Guo
et al.
, 2018).
Figure 2.6 Camera-based image attack (Longfei Wu
et al.
, 2014).
Figure 2.7 Format-based images (LK Tan, 2006).
Figure 2.8 Decision tree working scenario.
Figure 2.9 Modified ELM-LPG working mechanism (Zaher Yaseen
et al.
2017).
Figure 2.10 General diagram.
Figure 2.11 Proposed advanced LBPSOSA for image forgery detection.
Figure 2.12 Proposed flow of Local Binary Pattern Second-Order Statistics Algo...
Figure 2.13 LBPSOSA different features for ELM classification accuracy predict...
Figure 2.14 Forgery localization.
Figure 2.15 Feature selection methods.
Figure 2.16 BAT optimized CNN-ELM image forgery localizer.
Figure 2.17 BAT optimized CNN-ELM for image forgery predictor.
Chapter 3
Figure 3.1 Different forms of multimedia.
Figure 3.2 Data visualization method.
Figure 3.3 Types of machine learning.
Figure 3.4 Hierarchical learning.
Figure 3.5 Data clustering.
Figure 3.6 K-Prototype method.
Figure 3.7 Variation in lung X-rays in different situations [35].
Chapter 4
Figure 4.1 Vehicle data in IoT layers.
Figure 4.2 CAN bus connection.
Figure 4.3 Stage 1 of ACO.
Figure 4.4 Stage 2 of ACO.
Figure 4.5 Stage 3 of ACO.
Figure 4.6 Stage 4 of ACO.
Figure 4.7 ACO process.
Figure 4.8 Accuracy.
Figure 4.9 Sensitivity.
Figure 4.10 Specificity.
Figure 4.11 Graphical representations for time consumption.
Chapter 5
Figure 5.1 Supervised learning.
Figure 5.2 Semi-supervised learning.
Figure 5.3 Unsupervised learning.
Figure 5.4 Reinforcement learning.
Figure 5.5 Deep learning algorithms.
Figure 5.6 Agriculture green development.
Figure 5.7 ML in agriculture (pre-production phase).
Figure 5.8 ML in agriculture (production phase).
Chapter 6
Figure 6.1 Proposed encryption/decryption methodology for medical images.
Figure 6.2 (a) input DICOM CT image (D1), (b) Haar wavelet transform output, (...
Figure 6.3 (a) input DICOM CT image (D1), (b) permutation and substitution out...
Figure 6.4 First column depicts the DICOM CT input images, second column depic...
Figure 6.5 NPCR values of the encryption algorithms.
Figure 6.6 UACI values of encryption algorithms.
Figure 6.7 PSNR values of encryption algorithms.
Figure 6.8 Entropy values of plan and cipher image of encryption algorithms.
Chapter 7
Figure 7.1 Movie and video recommender systems.
Chapter 8
Figure 8.1 Haptic rendering pipeline.
Figure 8.2 Surface convolution.
Figure 8.3 Components of haptic rendering algorithm.
Figure 8.4 Algorithm used for tracing projection.
Figure 8.5 Hooke’s Law.
Figure 8.6 Thrust and torque prediction in glenoid reaming.
Figure 8.7 Tooth’s burring cross section. Dental instruments are necessary for...
Figure 8.8 Hardware and software simulation configuration.
Chapter 9
Figure 9.1 A typical educational environment based on multimedia. [1]
Chapter 10
Figure 10.1 Evaluation strategy for band selection methods used for dimensiona...
Figure 10.2 Workflow delineating the proposed approach for the computation of ...
Figure 10.3 Classification accuracy (Kappa coefficient) for the different vari...
Figure 10.4 Mean Kappa Coefficient for the different variants of mutual inform...
Figure 10.5 Classification accuracy (Kappa coefficient) for the different vari...
Figure 10.6 Mean Kappa coefficient for the different variants of mutual inform...
Figure 10.7 Mean classification accuracy for fixed training at 20% samples and...
Figure 10.8 Mean Kappa coefficient for the two cases and their average excludi...
Chapter 11
Figure 11.1 An overview of all the three classifiers.
Figure 11.2 Output of the Python implementation.
Figure 11.3 Confusion table.
Figure 11.4 Example for the confusion matrix.
Figure 11.5 Example for the confusion matrix.
Figure 11.6 Confusion matrix.
Figure 11.7 Confusion matrix.
Figure 11.8 Confusion matrix.
Figure 11.9 Confusion matrix.
Chapter 12
Figure 12.1 Hand geometry [35].
Figure 12.2 A typical hand-shape biometric system.
Figure 12.3 (a) standard face recognition procedure, (b) the process of face r...
Cover Page
Table of Contents
Series Page
Title Page
Copyright Page
Dedication
Book Description
List of Figures
List of Tables
Preface
Begin Reading
List of Authors
Index
Also of Interest
WILEY END USER LICENSE AGREEMENT
ii
iii
iv
v
vi
xiii
xiv
xv
xvi
xvii
xviii
xix
xx
xxi
xxii
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
301
302
303
305
306
307
308
Scrivener Publishing100 Cummings Center, Suite 541JBeverly, MA 01915-6106
Advances in Data Engineering and Machine Learning
Series Editor: Niranjanamurthy M, PhD, Juanying XIE, PhD, and Ramiz Aliguliyev, PhD
Scope: Data engineering is the aspect of data science that focuses on practical applications of data collection and analysis. For all the work that data scientists do to answer questions using large sets of information, there have to be mechanisms for collecting and validating that information. Data engineers are responsible for finding trends in data sets and developing algorithms to help make raw data more useful to the enterprise.
It is important to have business goals in line when working with data, especially for companies that handle large and complex datasets and databases. Data Engineering Contains DevOps, Data Science, and Machine Learning Engineering. DevOps (development and operations) is an enterprise software development phrase used to mean a type of agile relationship between development and IT operations. The goal of DevOps is to change and improve the relationship by advocating better communication and collaboration between these two business units. Data science is the study of data. It involves developing methods of recording, storing, and analyzing data to effectively extract useful information. The goal of data science is to gain insights and knowledge from any type of data — both structured and unstructured.
Machine learning engineers are sophisticated programmers who develop machines and systems that can learn and apply knowledge without specific direction. Machine learning engineering is the process of using software engineering principles, and analytical and data science knowledge, and combining both of those in order to take an ML model that’s created and making it available for use by the product or the consumers. “Advances in Data Engineering and Machine Learning Engineering” will reach a wide audience including data scientists, engineers, industry, researchers and students working in the field of Data Engineering and Machine Learning Engineering.
Publishers at ScrivenerMartin Scrivener ([email protected])Phillip Carmical ([email protected])
Edited by
Suman Kumar Swarnkar
J P Patra
Sapna Singh Kshatri
Yogesh Kumar Rathore
and
Tien Anh Tran
This edition first published 2024 by John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA and Scrivener Publishing LLC, 100 Cummings Center, Suite 541J, Beverly, MA 01915, USA© 2024 Scrivener Publishing LLCFor more information about Scrivener publications please visit www.scrivenerpublishing.com.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.
Wiley Global Headquarters111 River Street, Hoboken, NJ 07030, USA
For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.
Limit of Liability/Disclaimer of WarrantyWhile the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials, or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read.
Library of Congress Cataloging-in-Publication Data
ISBN 978-1-119-78634-4
Front cover images created with Adobe FireflyCover design by Russell Richardson
To everyone who made this book possible, I recognize your efforts from the depth of my heart. My Parents, my Wife, my Son, Colleagues of the Computer Science and Engineering Department, the Institution head and the faculty members of Shri Shankaracharya Institute of Professional Management and Technology, Raipurwithout you people this book wouldn’t have been possible. I dedicate this book to all of you.
Dr. Suman Kumar Swarnkar
To everyone who made this book possible, I recognize your efforts from the depth of my heart. My Parents, my Wife Sumitra, my Son Yuvraj, Colleagues of the Computer Science and Engineering Department, the Institution head and the faculty members of Shri Shankaracharya Institute of Professional Management and Technology, Raipurwithout you people this book wouldn’t have been possible. I dedicate this book to all of you.
Dr. J P Patra
I would like to express our sincere gratitude to everyone who made this book possible. My Father Late S.L. Rathore, my Mother, my Wife Pooja, my Son Shivank, my Daughter Priyanshi, all my family members, Colleagues of the Department of Computer Science and Engineering and management of Shri Shankaracharya Institute of Professional Management and Technology, Raipur for their support and timely advice. I gladly dedicate this book to all of you.
Mr. Yogesh Kumar Rathore
In the ever-evolving age of technology, Artificial Intelligence (AI) and Multimedia Data Engineering have become increasingly important tools for understanding and manipulating data. As AI and multimedia data engineering work together to create new technologies that can help us in our daily lives, it is essential to understand how these concepts interact with each other. This article will provide an overview of Artificial Intelligence and Multimedia Data Engineering, as well as their implications on modern society. Recent advances in AI have been aided by the development of multimedia data engineering techniques, which allow us to collect, store, analyze and visualize large amounts of information. By combining these two fields together we can gain a better understanding of how they interact with each other. The ability to extract meaningful insights from various types of datasets is becoming increasingly important in order to make decisions based on accurate data-driven analysis.
Figure 1.1 Basic sign language for each alphabet known characters
Figure 1.2 Block diagram of phases of sign language recognition
Figure 1.3 A few samples of MNIST sign language dataset
Figure 1.4 Initial vectorization of data
Figure 1.5 Final vectorization of data
Figure 1.6 Phases of binary class conversion
Figure 1.7 Sequential model with added layers
Figure 1.8 Image processing techniques and steps
Figure 1.9 A basic convolution for feature learning and classification
Figure 1.10 Vectorized data outcome
Figure 2.1 Copy move forgery attack
Figure 2.2 Photomontage attack
Figure 2.3 Resizing attack
Figure 2.4 Image splicing attack
Figure 2.5 Colorized image attack
Figure 2.6 Camera-based image attack
Figure 2.7 Format-based images
Figure 2.8 Decision tree working scenario
Figure 2.9 Modified ELM-LPG working mechanism
Figure 2.10 General diagram
Figure 2.11 Proposed advanced LBPSOSA for image forgery detection
Figure 2.12 Proposed flow of Local Binary Pattern Second-Order Statistics Algorithm (LBPSOSA) for Image Forgery Detection
Figure 2.13 LBPSOSA different features for ELM classification accuracy prediction
Figure 2.14 Forgery localization
Figure 2.15 Feature selection methods
Figure 2.16 BAT optimized CNN-ELM image forgery localizer
Figure 2.17 BAT optimized CNN-ELM for image forgery predictor
Figure 3.1 Different forms of multimedia
Figure 3.2 Data visualization method
Figure 3.3 Types of machine learning
Figure 3.4 Hierarchical learning
Figure 3.5 Data clustering
Figure 3.6 K-Prototype method
Figure 3.7 Variation in lungs X-rays in different situations
Figure 4.1 Vehicle data in IoT layers
Figure 4.2 CAN bus connection
Figure 4.3 Stage 1 of ACO
Figure 4.4 Stage 2 of ACO
Figure 4.5 Stage 3 of ACO
Figure 4.6 Stage 4 of ACO
Figure 4.7 ACO process
Figure 4.8 Accuracy
Figure 4.9 Sensitivity
Figure 4.10 Specificity
Figure 4.11 Graphical representations for time consumption
Figure 5.1 Supervised learning
Figure 5.2 Semi-supervised learning
Figure 5.3 Unsupervised learning
Figure 5.4 Reinforcement learning
Figure 5.5 Deep learning algorithms
Figure 5.6 Agriculture green development
Figure 5.7 ML in agriculture (pre-production phase)
Figure 5.8 ML in agriculture (production phase)
Figure 6.1 Proposed encryption/decryption methodology for medical images
Figure 6.2 (a) input DICOM CT image (D1), (b) Haar wavelet transform output, (c) image after permutation and diffusion, (d) encrypted image, (e) decrypted image based on wavelet transform technique
Figure 6.3 (a) input DICOM CT image (D1), (b) permutation and substitution output by 2D-Tent Cascade Logistic Map algorithm, (c) encrypted output, (d) decrypted image based on 2D-Tent Cascade Logistic Map algorithm
Figure 6.4 First column depicts the DICOM CT input images, second column depicts the decrypted images using wavelet transform algorithm, third column depicts the decrypted images using 2D-Tent Cascade Logistic Map algorithm
Figure 6.5 NPCR values of the encryption algorithms
Figure 6.6 UACI values of encryption algorithms
Figure 6.7 PSNR values of encryption algorithms
Figure 6.8 Entropy values of plan and cipher image of encryption algorithms
Figure 7.1 Movie and video recommender systems
Figure 8.1 Haptic rendering pipeline
Figure 8.2 Surface convolution
Figure 8.3 Components of haptic rendering algorithm
Figure 8.4 Algorithm used for tracing projection
Figure 8.5 Hooke’s Law
Figure 8.6 Thrust and torque prediction in glenoid reaming
Figure 8.7 Tooth’s burring cross section. Dental instruments are necessary for numerous dental procedures and tooth health. Dentists use the dental mirror to see inside the mouth and the probe to identify cavities and problems on the tooth’s surface. Plaque and tartar are removed by the scaler, improving oral health. Dental drill instruments vary by task, such as cavity preparation. Teeth are held and removed with forceps. Thin dental probes detect gum pocket depth to assess mouth health. A and b represent the tooth’s surface structure, which includes cuspids, incisors, and other elements that form and function it. The tooth’s complicated geometry makes it worthwhile in various oral functions. These dental tools help dentists diagnose, treat, and maintain oral health. Dental instruments are necessary for numerous dental procedures and tooth health. Dentists use the dental mirror to see inside the mouth and the probe to identify cavities and problems on the tooth’s surface. Plaque and tartar are removed by the scaler, improving oral health. Dental drill instruments vary by task, such as cavity preparation. Teeth are held and removed with forceps. Thin dental probes detect gum pocket depth to assess mouth health. A and b represent the tooth’s surface structure, which includes cuspids, incisors, and other elements that form and function it. The tooth’s complicated geometry makes it worthwhile in various oral functions. These dental tools help dentists diagnose, treat, and maintain oral health.
Figure 8.8 Hardware and software simulation configuration
Figure 9.1 A typical educational environment based on multimedia
Figure 10.1 Evaluation strategy for band selection methods used for dimensionality reduction of hyperspectral data
Figure 10.2 Workflow delineating the proposed approach for the computation of the normalized mutual information and the adjusted mutual information
Figure 10.3 Classification accuracy (Kappa coefficient) for the different variants of mutual information with respect to the different number of bands for the (a) Indian Pines dataset, (b) Dhundi dataset, (c) Pavia University dataset, (d) Salinas dataset, for 20% random training samples using the Random Forest classifier
Figure 10.4 Mean Kappa Coefficient for the different variants of mutual information for the different number of bands for the (a) Indian Pines dataset, (b) Dhundi dataset, (c) Pavia University dataset, (d) Salinas dataset
Figure 10.5 Classification accuracy (Kappa coefficient) for the different variants of mutual information with respect to the different volume of training samples for the (a) Indian Pines, (b) Dhundi dataset, (c) Pavia University dataset, (d) Salinas dataset, for 20 selected best bands based on the Random Forest classifier
Figure 10.6 Mean Kappa coefficient for the different variants of mutual information for the different volume of training data for the (a) Indian Pines dataset, (b) Dhundi dataset, (c) Pavia University dataset, (d) Salinas dataset
Figure 10.7 Mean classification accuracy for fixed training at 20% samples and 20 selected bands over the four test datasets for each of the MI variants in (a) and (b)
Figure 10.8 Mean Kappa coefficient for the two cases and their average excluding the Indian Pines dataset
Figure 11.1 An overview of all the three classifiers
Figure 11.2 Output of the Python implementation
Figure 11.3 Confusion table
Figure 11.4 Example for the confusion matrix
Figure 11.5 Example for the confusion matrix
Figure 11.6 Confusion matrix
Figure 11.7 Confusion matrix
Figure 11.8 Confusion matrix
Figure 11.9 Confusion matrix
Figure 12.1 Hand geometry
Figure 12.2 A typical hand-shape biometric system
Figure 12.3 (a) standard face recognition procedure, (b) the process of face recognition
Table 1.1 Accuracy and loss values per epochs
Table 1.2 Experimental results of training and testing data for accuracy and loss
Table 2.1 Different ML classifiers
Table 2.2 Modified LBP variants (Seven wonders of LBP) and second-order statistical feature extraction, GLRLM algorithm
Table 2.3 Schema of 1-5 database
Table 2.4 Image forgery detection & recognition (original &forged)
Table 2.5 Accuracy for different methods
Table 2.6 Accuracy for different methods
Table 4.1 Accuracy
Table 4.2 Sensitivity
Table 4.3 Specificity
Table 4.4 Table of time consumption
Table 7.1 List of papers and their summary on CNN-based recommender system
Table 7.2 Statistics of MovieLense 10M dataset
Table 7.3 Different fusions functions performances with late fusion model
Table 7.4 Multi-user interest performance analysis
Table 7.5 Performance comparison with different deep learning model
Table 8.1 Standard deviation of running time in different resolutions
Table 10.1 Summary of the test datasets including the Indian Pines, Salinas, Dhundi and the Pavia University
Table 10.2 The different types of NMI and corresponding AMI variants according to Vinh
et al
. [44]
Table 10.3 Confusion matrix
Table 10.4 Kappa coefficient values for the two cases used in strategic evaluation of the potential of the NMI/AMI variants and the proposed weighted NMI and weighted AMI for hyperspectral band selection
Table 12.1 Security perspective, properties, data sets, and success criteria comparison of used machine learning techniques
Artificial intelligence (AI) is a rapidly growing field of engineering that has the potential to revolutionize the way we interact with machines, process data, and even see our world. Multimedia Data Engineering (MDE) is an important branch of AI which focuses on how machine learning algorithms can be used to analyze and interpret large amounts of multimedia data. With this article, we will explore how AI technologies are utilized in MDE and the benefits they bring to professionals working in this domain.
At its core, MDE combines AI techniques with traditional computer science principles to make sense of vast amounts of multimedia data. By leveraging advances such as facial recognition technology, natural language processing tools, text-to-speech applications and more, engineers are able to transform unstructured data into valuable insights for businesses.
The research papers of this issue are broadly classified into current computing techniques, Artificial intelligence and Multimedia Data Engineering and implementation.
The editor thanks all the reviewers for their excellent contributions to this issue. I sincerely hope that you will enjoy reading these papers, and we expect them to play an important role in promoting advanced computing techniques and implementation research. I hope that this issue will prove a great success with the exchange of ideas, which will forester future research collaborations.
Dr. Suman Kumar Swarnkar
Department of Computer Science and Engineering Shri Shankaracharya Institute of Professional Management and Technology Raipur, Chhattisgarh, India
Dr. J P Patra
Department of Computer Science and Engineering Shri Shankaracharya Institute of Professional Management and Technology Raipur, Chhattisgarh, India
Dr. Sapna Singh Kshatri
Department of Computer Science and Engineering Shri Shankaracharya Institute of Professional Management and Technology Raipur, Chhattisgarh, India
Yogesh Kumar Rathore
Department of Computer Science and Engineering, Shri Shankaracharya Institute of Professional Management and Technology Raipur, Chhattisgarh, India
Dr. Tien Anh Tran
Vietnam Maritime University Haiphong, Vietnam
Monika Lamba1* and Geetika Munjal2
1Department of Computer Science and Engineering (CSE), The NorthCap University, Gurugram, India
2Amity School of Engineering and Technology, Amity University, Noida, Uttar Pradesh, India
An application called Sign Language Recognition (SLR) can recognise a variety of non-identical letter movements and translate them into text. In the area of science and technology, this application is extremely significant. It can be used in a variety of machine learning-based applications, including virtual reality. The purpose of the chapter is to develop a convolutional neural network that will recognise the signs captured or focused from the video capture and in turn provide us with correct or accurate output based on text and to improve the accuracy of the real-time sign language recognition via scanning and detecting that would aid other physically challenged individuals. For all individuals who want assistance in communicating with the rest of society, it offers an offline application. In order to produce quick, precise results and to ensure that the material isn’t lost during the evaluation process, it tries to evaluate gestures more efficiently. Real-time sign language recognition involves first identifying images from a video feed that has been acquired using a machine learning model, then identifying edges and vertices, and then determining the desired result using a convolutional neural network. This method will be carried out at runtime to obtain results continuously while creating sign language with very little wait time utilising the CNN model. Character identification will be easier with this approach, and sentences can be constructed with high levels of accuracy using fewer letters.
Keywords: Language recognition, real time, sign, convolutional neural network, machine learning
Nowadays, technology has taken an advanced leap forward in terms of improvement and efficiency. One of the many technologies that have taken such steps is Real Time Sign Language Recognition. Sign language recognition is an application to detect various gestures of different characters and convert them into text. This application has a huge importance in the field of science and technology. It has different applications based on machine learning and even in virtual reality. There are various types of sign languages such as ISL (Indian Sign Language) [1], BSL (British Sign Language) [2], ASL (American Sign Language) [3] and many more implemented differently at different parts of the world. Our aim is to apply American Sign language for sign to text recognition [3] [4] [5]. The American sign language is similar to other normal languages in that it can be expressed using gestures like hand or body movements. Although it shares many characteristics with other languages, it does not have English-like grammar. It is the most widely used sign language on earth. It is primarily used in nations like America, Africa, and much of southeast Asia. American sign language serves as a link between the deaf and hearing communities. They can textually describe their actions with the aid of this programme. This type of work has also been done in the past, with each instance producing unique outcomes and approaches, although few of them meet the standards for excellence.
The overall expansion of this language has been aided by its use in places like schools, hospitals, police stations, and other learning facilities. Since it is widely regarded as being simple to comprehend and fully independent of context, some people even choose to talk using this language. There are instances where newborn infants will receive this language from their mothers as their mother tongue. In fact, this is how sign language is meant to be understood. Figure 1.1 shows a visual representation of alphabets as signs.
Structure, grammar, and gestures are typically where sign languages diverge from one another. Unlike other sign languages, American Sign Language has a single-headed finger spelling alphabet. Compared to others, it is simpler to implement and interpret. The motions were also developed with consideration for various cultural traditions. Because people are accustomed to these gestures throughout their lives, this in turn draws a larger audience. The two-handed nature of BSL communication makes it difficult for non-BSL users to comprehend and interpret the language [5].
Figure 1.1 Basic sign language for each alphabet known characters.
The ISL is a well-known sign language in India; yet, because there are fewer studies and sources for accurate translations and because ASL has a larger audience, many individuals prefer ASL to other sign languages [6]. The ISL also has numerous identical motions with different meanings, which can be confusing when translated, even though all of these languages take roughly the same amount of time to translate letters and words. We needed ASL as the sign language converter because it is a more widely spoken language than English [7] [8].
The most fundamental need in society is for effective communication. Deaf and dumb people struggle greatly every day to communicate with regular people. Because those who are deaf or mute need to have their proper place in society, such an application was desperately needed. They experience secondary issues like loneliness and despair as a result of their basic weakness; thus it would be preferable if they could integrate more socially and forge more social ties [9] [10].
People also frequently offer alternative solutions, one of which is, “Instead of utilising another language to communicate, why don’t deaf people just write it down and display it instead?” This explanation may appear reasonable and enticing from the perspective of a person without this disability, but the people who are experiencing these challenges require human solutions to their problems. These people need to express their feelings and activities, which cannot be done solely through writing. Consequently, that is still another justification for our decision to make a contribution to the field of sign language [11].
The concept of delivering results in written form primarily enables us to communicate with those who lack the opportunity to talk or hear. A little ease in their life would be given to all the dumb or deaf people thanks to such an application. The happier these people will be sharing such a larger platform, the more such applications will be created and technology is enhanced.
Technologies like speech, gesture, and hand are significant piece of HCI (human computer interaction) [12]. Gesture recognition has numerous applications such as sign language, robot control, and virtual reality. In the proposed method of Zhi-hua Chen [13], hand recognition is grounded on finger recognition and hence, it is more effective and uses a simpler rule classifier. The rule classifier used in real-time applications is highly efficient. The author used a simple camera to notice hand gesture rather than using data glove and special tape which are much more expensive. It includes fingers, hand detection, palm segmentation and hand gesture recognition. In the very first step, i.e., hand detection, the colour of the skin is measured using HSV model and the image is resized to 200 x 200. The output of this step generates a binary image in which white pixels represent the hand and black pixel represent the background. The next step is the segmentation of palm and fingers which is obtained with the help of palm point (center of pam), wrist line and wrist point. The labelling algorithm is applied to detect regions of fungus. Finally, hand gesture is recognized by counting fingers and identifying what figure. The dataset of 1,300 images is used to prove highly accurate results. The system takes 0.024 seconds to recognize a hand [13]. Zeenat [14] studied gesture as basically a form of non-verbal communication that involves gestures by which people communicate with each other. People can’t communicate without gesture. They are the mode of communication which people use to communicate. Cooperation between people comes from various tactile modes like signal, discourse, facial and body articulations. The principle preferred position of utilizing hand signals is to connect with computer as an on-contact human computer input methodology. Hand gesture has eliminated the use of controlling of movement of virtual objects. One of the most broadly utilized examples for hand gesture recognition is data glove. Use of gesture recognition has also eliminated the use of data glove due to the expensive cost of gloves. There are three stages of gesture recognition: 1. Image pre-processing 2. Tracking 3. Recognition. The system developed was to capture the hand gesture in front of a web camera, which in turn would take a picture and then continue to recognize the reasonable motion through a specific algorithm. This paper fundamentally includes investigation and distinguishing proof of hand signals to perform suitable activities. Picture preparing is fundamentally an examination of digitized picture so as to improve its quality. EMGO-CV is fundamentally utilized for picture preparing. Emgu CV is a cross stage. Net wrapper to the Intel Open CV picture handling library. Permitting OpenCV capacities to be called from .NET perfect dialects, for example, C#, VB, VC++, Iron Python. The creator utilizes various procedures to discover number of fingers present close by signal.
Nayana presented a procedure for human computer communication exploiting open source like Python OpenCV. The proposed calculation comprises pre-handling, division and highlight extraction. Highlights include snapshots of the picture, centroid of the picture and Euclidean separation. The hand signal pictures are taken by a camera. The role of hand gestures is very important in day-to-day life. They basically convey expressive meanings by which people communicate with each other. This model presented a hand signal acknowledgment framework which utilizes just hand motions to speak. The calculation is partitioned into three sections: pre-handling, division and highlight extraction. The study utilizes forms, raised body and convexity deformities to discover the hand signal. In the course of the most recent couple of years a few explores are directed close by motion acknowledgment utilizing OpenCV and a few exhibitions correlations are led to improve the system. Picture changes are finished on the RGB picture to change over into YCBCR picture. The YCBCR picture changed into parallel picture. This computation needs uniform and plane establishment. OpenCV (Open-Source Computer Vision Library) is a library which mostly centres at continuous computer vision. OpenCV was structured for computational proficiency and with a solid spotlight on applications. It gives essential information structures for picture handling with productive improvements. Python is an object-oriented approach. For the implementation, Hand segmentation algorithm is used. In this calculation, hand division is utilized to extricate the hand picture from the foundation. There are a few strategies for division. The significant advance in division is change and thresholding. In this calculation, the BGR picture taken by a camera is considered as contribution to the calculation. The BGR picture is changed into a dark scale picture. The dark scale picture is obscured to get the definite limit. The obscured picture is edge to the specific worth. The creator introduced a procedure to locate the quantity of fingers present in the hand signal [15] [16].
Hand gesture recognition [17] is basically used for identifying the shapes or orientation depending on the feasibility of performing the task. Gestures are mainly used for conveying meaningful messages. They are the most important part of human life. Data gathering is the author’s first action. The first step is to take a picture with the camera and identify a region of interest in the frame. This is important since the picture may contain a number of aspects that could lead to unfavourable outcomes and significantly reduce the amount of information that has to be prepared. A webcam is used to take the photo, which continuously records outline data and is used to gather basic planning information. Data pre-processing, which is done in two steps and comprises segmentation and morphological filtering, is the next phase. In order to have just two areas of interest in a photograph, segmentation is used to convert a dark-scale image into a twofold image. In other words, one will be a hand, and the other a foundation. For this process, calculations known as “Otsu calculations” can be used. Dark-scale images are transformed into two-dimensional images with the area of interest serving as the hand and foundation. In order to ensure that there is no noise in the image, morphological filtering is used. Dilation, Erosion, Opening, and Closing are the basic filtering methods that can be used to assure that there is no noise. There is also a possibility of errors which can be termed as gesture noise [18].
Utilizing hand motions is one of the most regular ways of collaborating with the computer, and in particular right translation of moving hand signals progressively has numerous applications. In his paper, Nuwan Munasinghe [19] has planned and built up a framework which can perceive motions before a web camera ongoing utilizing movement history pictures (MHI) and feedforward neural systems. With the introduction of new technologies, new methods of interaction with computers have been introduced. Old methods were keyboards, mouse, joysticks and data gloves. Gesture recognition has been a commonly used method for interacting with computers and it also provides good interface for human computer interaction. It also has a lot of applications like sign language recognition, gameplay, etc. Gestures are non-verbal means of communication which are used to convey meaningful message. They can be static and dynamic. In this dynamic gesture recognition has been performed. Normally gesture recognition can be split into two parts: first, vision-based, and second, one that considers the use of keyboards, mouse, etc. The vision-based makes use of pattern recognition, image processing etc. In vision-based methodologies, a signal acknowledgment robot control framework has been created and hand presents and faces are distinguished utilizing various component-based layout coordinating systems, and to accomplish this, analysts have utilized skin shading–based division technique. Motions have perceived utilizing a standard-based framework where distinguished skin-like locales are coordinated with predefined motions. Feedforward neural network uses the concept of static hand gesture recognition to establish 10 different types of static gestures. There are a few algorithms which have been used for real-time gesture recognition using k-nearest neighbour and decision tree. The primary concern is how the computer vision-based methods and feed-forward neural systems-based grouping strategies have been utilized to build up an ongoing unique signal acknowledgment framework. In this paper, the author has basically made use of vision-based and neural network based real time-gesture recognition system.
In Ali [20] a steady vision-based structure is proposed to screen objects (hand fingers). It is built subject to the Raspberry Pi with camera module and changed with Python programming Language maintained by Open-Source PC Vision (OpenCV) library. The Raspberry Pi inserts with an image dealing with figuring called hand movement, which screens a thing (hand fingers) with its isolated features. The fundamental point of hand signal acknowledgment framework is to set up a correspondence between human and electronic frameworks for control. The perceived signals are utilized to control the movement of a portable robot progressively. The portable robot is constructed and tried to demonstrate the viability of the proposed calculation. The robot movement and route happy with various headings: Forward, Backward, Right, Left and Stop. Vision-based and picture handling frameworks have different applications in design acknowledgment and moving robots’ route. Raspberry Pi is a little measured PC load up [20] reasonable for real-time ventures. The fundamental reason for the work exhibited in this paper is to make a framework equipped for distinguishing and checking a few highlights for objects that are predetermined by a picture handling calculation utilizing Raspberry Pi and camera module. The element extraction calculation is modified with Python upheld by OpenCV libraries, and executed with the Raspberry Pi connected with an outer camera. In this paper, it is displayed convenient robot using Raspberry Pi, where its advancement is constrained by methods for the camera related with Raspberry Pi that forward headings direct to the driver of a two-wheel drive portable meanderer. It used hand signal computation to recognize the article (hand) and control the improvement of the robot. Moreover, it made this robot work with living circumstance poor brightening condition. Software used for the implementation of the system is Raspbian OS which is developed for Raspberry Pi. Python and OpenCV are used also. Python, as we already know, is a very high–level programming language with fewer lines of code. It is simple and easy to execute. It has an extensive number of libraries. OpenCV is a free library that joins a few APIs for PC dreams used in picture taking care of to propel an authentic time application. There are a couple of features in OpenCV which support data dealing with, including: object area, camera arrangement, 3D multiplication and interface to video planning. Python programming language was utilized to manufacture the hand motion acknowledgment framework.
David [21] proposed a method that detected two hands simultaneously using techniques like border detection and filters. The application is divided into two parts: Robot and GPS use. In the Robot, the hand gestures are used to control the robot and in GPS, the GPS is controlled using gestures. The data of 600 gestures is used which is performed by 60 users. The application gave 93.1% accuracy and successively detected hand gesture. The least detected hand gestures are showing one figure but still is 75% accurate.
VivekBhed [22] displays the Sign language which is a kind of correspondence that regularly goes on understudied. Where the interpretation procedure among signs and communicated in or composed language is officially called understanding it assumes the job which is equivalent to the interpretation for the communicated in language. Nowadays, the usage of depth-sensing technology is growing in popularity; the Custom Designed Colour Gloves makes the feature extraction much more efficient. Though the depth-sensing technology is not used for automatic sign language recognition, there have also been successful attempts at using CNNs to handle task of classifying images of ASL letter gesture. The general design was a reasonably CNN engineering; it has various convolutional and thick layers. The information incorporates an assortment of 25 pictures from 5 individuals for every letter set and digits 1-9, a pipeline was built up that can be utilized so individuals can add pictures to this dataset. The performances will be improved and observed in the Data Augmentation process. A Deep Learning approach for a classification of ASL, the method used here shows potential in solving the problem using a simple camera which is easy to access, also bringing out the huge difference in performance of algorithms.
Hand gestures are also known as sign language; it is a language basically used by deaf people and by people who are unable to speak. There is a process known as the hand gesture recognition process focused on the recognition of meaningful expression of form and motion by the involvement of only the hands. Hand gestures recognition is applied in plenty of the applications for the purpose of accessibility, communication and learning.
This paper includes information about different experiments conducted on different types of the convolutional neural network, and it is evaluated on marcel dataset. Gjorgji [23] presented an approach, i.e., mainly divided into Data-glove-based approach which collects the information from the sensor attached to the glove and mounted on the hand of the user. The approach describes the artificially visual field to complement biological human vision. Hand motions are a basic part in human-to-human correspondence. The effectiveness of data move utilizing this procedure of correspondence is remarkable; thusly it has started thoughts for usage in the region of human-PC collaboration. For this to be conceivable the PC needs to perceive the motion appeared to it by the individual controlling it.
Various individuals and background were utilized so as to build assorted variety and data contained inside the dataset. Since the profound models that we prepared in our analyses require an enormous mass of information to prepare appropriately, we utilized information expansion on the pictures in the dataset. This was done so as to pick up amount while as yet presenting some curiosity as far as data to our dataset. GoogLeNet is a profound convolutional neural system structured by Google highlighting their prevalent Inception engineering [24].
Gesture-based communications, which comprise a blend of hand developments and outward appearances, are utilized by hard-of-hearing people the world over to communicate. Be that as it may, hearing people once in a while know gesture-based communications, making obstructions to consideration.
The expanding progress of portable innovation, alongside new types of client collaboration, opens up potential outcomes for overcoming such obstructions, especially using signal acknowledgment through cell phones. This literature review discusses works from 2009 to 2017 that present answers for motion acknowledgment in a versatile setting just as facial acknowledgment in gesture-based communications. Among an assorted variety of equipment and methods, sensor-based gloves were the most utilized extraordinary equipment, alongside animal power correlation with order motions.
The main ideas of the content up to this point have been what sign language is and why American sign language was chosen above other sign languages. What compelled us to research this topic even more? Now the question is, how are we going to accomplish such a task? To answer that, we must be able to comprehend the idea that this research is trying to portray as well as the procedures or strategies that will be employed to carry out this research sequentially.
Figure 1.2 Block diagram of phases of sign language recognition.
The fundamental assumption of this chapter is that whenever people need to communicate with each other, especially the deaf or dumb, they cannot comprehend each other, even persons with no physical issues with deaf or dumb people. By bridging that communication gap, this application will help.
Consequently, how do these sign languages function? Unlike commonly spoken languages, signs occasionally express their meaning by hand, facial, or body motions. Given that grammar is different from spoken language, this also applies to the way it is presented. Non-manual action is the term used to describe the act of communicating with sign languages.
The process shown in Figure 1.2 shows how to guarantee accurate and correct input recognition. Each phase is broken down into numerous substeps, all of which will be covered in detail in this chapter.
Real-time sign language recognition is a challenging application; therefore, system requirements must be kept in mind. Such research can typically be implemented using both low- and high-resolution cameras as well as more advanced systems. In order to ensure that neural network implementation is still effective even with low-resolution photos, this research will capture the primary input from the webcam of the laptop.
Python will be needed for this research’s programming language, and it will also need tools like PyCharm or Jupiter Notebook [25] for the research’s internal operations. For the training dataset, there will be at least 200 data sets for each character. Additionally, this research will apply effective machine language algorithms and aim to raise the baseline accuracy level.
The research will make use of well-known platforms and libraries to compute data and present results in a certain format, such as:
TensorFlow – It is an open-source platform for developing and building applications based on machine learning. It contains all the tools and libraries for the development of machine learning–powered applications.
Keras – It was founded to provide ease in deep neural network applications. It is mainly an application programming interface which can be performed using Python programming language. It is mostly helpful in back-end working.
OpenCV – It is a library which contains the material to operate real-time applications. It can be used with Python language. It is mainly for front end purposes.
Libraries such as NumPy or Os are also used to calculate mathematical procedures as well as file reading and writing executions.
Humans have been accustomed to using two dimensions, yet occasionally three dimensions are employed as well, much like throughout evolution. What if, however, there are n dimensions to consider? When seemingly straightforward situations become completely unmanageable for human interactions, machine learning becomes useful.
We have a really helpful dataset that is directly related to computer vision and is used for sign language recognition. This data collection was created by MNIST:
M-Modified
N-National
I -Institute
S-Standards
T-Technology
The MNIST dataset was produced using Sign language in all of its possible manifestations [26]. A few samples of MNIST are mentioned in Figure 1.3. The data set’s signs have size of 200*200 pixels on both the horizontal and vertical axes. The sequences and each element of the dataset have been numerically labelled according to the class to which they belong.
Figure 1.3 A few samples of MNIST sign language dataset.