188,99 €
Discover how Natural Language Processing for Software Engineering can transform your understanding of agile development, equipping you with essential tools and insights to enhance software quality and responsiveness in today’s rapidly changing technological landscape.
Agile development enhances business responsiveness through continuous software delivery, emphasizing iterative methodologies that produce incremental, usable software. Working software is the main measure of progress, and ongoing customer collaboration is essential. Approaches like Scrum, eXtreme Programming (XP), and Crystal share these principles but differ in focus: Scrum reduces documentation, XP improves software quality and adaptability to changing requirements, and Crystal emphasizes people and interactions while retaining key artifacts. Modifying software systems designed with Object-Oriented Analysis and Design can be costly and time-consuming in rapidly changing environments requiring frequent updates. This book explores how natural language processing can enhance agile methodologies, particularly in requirements engineering. It introduces tools that help developers create, organize, and update documentation throughout the agile project process.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 900
Veröffentlichungsjahr: 2025
Cover
Table of Contents
Series Page
Title Page
Copyright Page
Preface
1 Machine Learning and Artificial Intelligence for Detecting Cyber Security Threats in IoT Environmment
1.1 Introduction
1.2 Need of Vulnerability Identification
1.3 Vulnerabilities in IoT Web Applications
1.4 Intrusion Detection System
1.5 Machine Learning in Intrusion Detection System
1.6 Conclusion
References
2 Frequent Pattern Mining Using Artificial Intelligence and Machine Learning
2.1 Introduction
2.2 Data Mining Functions
2.3 Related Work
2.4 Machine Learning for Frequent Pattern Mining
2.5 Conclusion
References
3 Classification and Detection of Prostate Cancer Using Machine Learning Techniques
3.1 Introduction
3.2 Literature Survey
3.3 Machine Learning for Prostate Cancer Classification and Detection
3.4 Conclusion
References
4 NLP-Based Spellchecker and Grammar Checker for Indic Languages
4.1 Introduction
4.2 NLP-Based Techniques of Spellcheckers and Grammar Checkers
4.3 Grammar Checker Related Work
4.4 Spellchecker Related Work
4.5 Conclusion
References
5 Identification of Gujarati Ghazal Chanda with Cross-Platform Application
Abbreviations
5.1 Introduction
5.2 Ghazal
5.3 History and Grammar of Ghazal
5.4 Literature Review
5.5 Proposed System
5.6 Conclusion
References
6 Cancer Classification and Detection Using Machine Learning Techniques
6.1 Introduction
6.2 Machine Learning Techniques
6.3 Review of Machine Learning for Cancer Detection
6.4 Methods
6.5 Result Analysis
6.6 Conclusion
References
7 Text Mining Techniques and Natural Language Processing
7.1 Introduction
7.2 Text Classification and Text Clustering
7.3 Related Work
7.4 Methodology
7.5 Conclusion
References
8 An Investigation of Techniques to Encounter Security Issues Related to Mobile Applications
8.1 Introduction
8.2 Literature Review
8.3 Results and Discussions
8.4 Conclusion
References
9 Machine Learning for Sentiment Analysis Using Social Media Scrapped Data
9.1 Introduction
9.2 Twitter Sentiment Analysis
9.3 Sentiment Analysis Using Machine Learning Techniques
9.4 Conclusion
References
10 Opinion Mining Using Classification Techniques on Electronic Media Data
10.1 Introduction
10.2 Opinion Mining
10.3 Related Work
10.4 Opinion Mining Techniques
10.5 Conclusion
References
11 Spam Content Filtering in Online Social Networks
11.1 Introduction
11.2 E-Mail Spam Identification Methods
11.3 Online Social Network Spam
11.4 Related Work
11.5 Challenges in the Spam Message Identification
11.6 Spam Classification with SVM Filter
11.7 Conclusion
References
12 An Investigation of Various Techniques to Improve Cyber Security
12.1 Introduction
12.2 Various Attacks [6–9]
12.3 Methods
12.4 Conclusion
References
13 Brain Tumor Classification and Detection Using Machine Learning by Analyzing MRI Images
13.1 Introduction
13.2 Literature Survey
13.3 Methods
13.4 Result Analysis
13.5 Conclusion
References
14 Optimized Machine Learning Techniques for Software Fault Prediction
14.1 Introduction
14.2 Literature Survey
14.3 Methods
14.4 Result Analysis
14.5 Conclusion
References
15 Pancreatic Cancer Detection Using Machine Learning and Image Processing
15.1 Introduction
15.2 Literature Survey
15.3 Methodology
15.4 Result Analysis
15.5 Conclusion
References
16 An Investigation of Various Text Mining Techniques
16.1 Introduction
16.2 Related Work
16.3 Classification Techniques for Text Mining
16.4 Conclusion
References
17 Automated Query Processing Using Natural Language Processing
17.1 Introduction
17.2 The Challenges of NLP
17.3 Related Work
17.4 Natural Language Interfaces Systems
17.5 Conclusion
References
18 Data Mining Techniques for Web Usage Mining
18.1 Introduction
18.2 Web Mining
18.3 Web Usage Data Mining Techniques
18.4 Conclusion
References
19 Natural Language Processing Using Soft Computing
19.1 Introduction
19.2 Related Work
19.3 NLP Soft Computing Approaches
19.4 Conclusion
References
20 Sentiment Analysis Using Natural Language Processing
20.1 Introduction
20.2 Sentiment Analysis Levels
20.3 Challenges in Sentiment Analysis
20.4 Related Work
20.5 Machine Learning Techniques for Sentiment Analysis
20.6 Conclusion
References
21 Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data
21.1 Introduction
21.2 Web Mining
21.3 Taxonomy of Web Data Mining
21.4 Web Content Mining Methods
21.5 Efficient Algorithms for Web Data Extraction
21.6 Machine Learning Based Web Content Extraction Methods
21.7 Conclusion
References
22 Intelligent Pattern Discovery Using Web Data Mining
22.1 Introduction
22.2 Pattern Discovery from Web Server Logs
22.3 Data Mining Techniques for Web Server Log Analysis
22.4 Graph Theory Techniques for Analysis of Web Server Logs
22.5 Conclusion
References
23 A Review of Security Features in Prominent Cloud Service Providers
23.1 Introduction
23.2 Cloud Computing Overview
23.3 Cloud Computing Model
23.4 Challenges with Cloud Security and Potential Solutions
23.5 Comparative Analysis
23.6 Conclusion
References
24 Prioritization of Security Vulnerabilities under Cloud Infrastructure Using AHP
24.1 Introduction
24.2 Related Work
24.3 Proposed Method
24.4 Result and Discussion
24.5 Conclusion
References
25 Cloud Computing Security Through Detection & Mitigation of Zero-Day Attack Using Machine Learning Techniques
25.1 Introduction
25.2 Related Work
25.3 Proposed Methodology
25.4 Results and Discussion
25.5 Conclusion and Future Work
References
26 Predicting Rumors Spread Using Textual and Social Context in Propagation Graph with Graph Neural Network
26.1 Introduction
26.2 Literature Review
26.3 Proposed Methodology
26.4 Results and Discussion
26.5 Conclusion
References
27 Implications, Opportunities, and Challenges of Blockchain in Natural Language Processing
27.1 Introduction
27.2 Related Work
27.3 Overview on Blockchain Technology and NLP
27.4 Integration of Blockchain into NLP
27.5 Applications of Blockchain in NLP
27.6 Blockchain Solutions for NLP
27.7 Implications of Blockchain Development Solutions in NLP
27.8 Sectors That can be Benified from Blockchain and NLP Integration
27.9 Challenges
27.10 Conclusion
References
28 Emotion Detection Using Natural Language Processing by Text Classification
28.1 Introduction
28.2 Natural Language Processing
28.3 Emotion Recognition
28.4 Related Work
28.5 Machine Learning Techniques for Emotion Detection
28.6 Conclusion
References
29 Alzheimer Disease Detection Using Machine Learning Techniques
29.1 Introduction
29.2 Machine Learning Techniques to Detect Alzheimer’s Disease
29.3 Pre-Processing Techniques for Alzheimer’s Disease Detection
29.4 Feature Extraction Techniques for Alzheimer’s Disease Detection
29.5 Feature Selection Techniques for Diagnosis of Alzheimer’s Disease
29.6 Machine Learning Models Used for Alzheimer’s Disease Detection
29.7 Conclusion
References
30 Netnographic Literature Review and Research Methodology for Maritime Business and Potential Cyber Threats
30.1 Introduction
30.2 Criminal Flows Framework
30.3 Oceanic Crime Exchange and Categorization
30.4 Fisheries Crimes and Mobility Crimes
30.5 Conclusion
30.6 Discussion
References
31 Review of Research Methodology and IT for Business and Threat Management
Abbreviation Used
31.1 Introduction
31.2 Conclusion
References
About the Editors
Index
Also of Interest
End User License Agreement
Chapter 5
Table 5.1 Understanding Matra.
Table 5.2 Understanding Matra with a poetic line.
Chapter 23
Table 23.1 Attack breaches on cloud providers.
Table 23.2 Features provided by service model.
Table 23.3 Various security issues in cloud.
Table 23.4 Various Security measures used by major cloud providers.
Chapter 24
Table 24.1 Primary & proposed security criteria.
Table 24.2 Scale of relative importance.
Table 24.3 Pairwise comparison matrix for security criteria.
Table 24.4 Random index (RI).
Table 24.5 Security criteria for AHP.
Table 24.6 Pairwise comparison matrix for security criteria.
Table 24.7 Normalized pairwise comparison matrix.
Table 24.8 Weighted priority matrix for security criteria.
Table 24.9 Comparison of results of proposed and previous method.
Chapter 25
Table 25.1 Specification of Data Sets used and their respective threats/risk.
Table 25.2 Confusion matrix of all six datasets with six algorithms/classifier...
Table 25.3 Detection of top 12 attacks/risk using proposed methodology
Table 25.4 Comparison of performance of proposed model & previous methods/mode...
Chapter 26
Table 26.1 Dataset and graph statistics.
Table 26.2 Performance evaluation.
Table 26.3 Rumor detection model performance with different node features mode...
Chapter 27
Table 27.1 Various research work on NLP with blockchain.
Chapter 30
Table 30.1 Oceanic crimes categorization [3, 4].
Table 30.2 Increased in crime rate chart [7, 8].
Table 30.3 Generated marine threats [32–34].
Table 30.4 Comparative work analysis.
Chapter 31
Table 31.1 REM framework [3].
Table 31.2 Sampling designs methods [5].
Table 31.3 Data gathering methods [7].
Table 31.4 Data analysis methods [8].
Table 31.5 Literature review classifications [10].
Table 31.6 Datasets [6, 7].
Table 31.7 RM in BM [8].
Table 31.8 Threat management systems [7, 8].
Table 31.9 RM tools [8, 9].
Table 31.10 Visualization tools [9, 10].
Chapter 1
Figure 1.1 Increasing number of DDOS attacks [Source: Cisco Annual Internet Re...
Figure 1.2 Threats to Internet of Things.
Figure 1.3 Number of new vulnerabilities identified in IoT [Source- IBM X-Forc...
Figure 1.4 Host-based IDS.
Figure 1.5 Network-based intrusion detection system.
Chapter 2
Figure 2.1 Data mining methods.
Figure 2.2 A sample decision tree—partial view.
Chapter 4
Figure 4.1 Indic language grammar checker research studies found online as of ...
Figure 4.2 Indic language Spellchecker research studies found online as of Mar...
Chapter 5
Figure 5.1 Vowels of the Gujarati language [7].
Figure 5.2 Consonants of the Gujarati language [7].
Figure 5.3 Conjunct consonants of the Gujarati language [7].
Figure 5.4 Numerals of the Gujarati language [7].
Figure 5.5 Sample text in Gujarati.
Figure 5.6 Proposed system.
Figure 5.7 Splash screen.
Figure 5.8 Login screen.
Figure 5.9 Home screen.
Figure 5.10 History screen.
Figure 5.11 Help screen. Type of chanda.
Figure 5.12 Ouput for Khafif Ghazal.
Figure 5.13 Website output for Khafif Ghazal.
Chapter 6
Figure 6.1 A framework for cancer image classification and detection.
Figure 6.2 LSTM network.
Figure 6.3 Convolution neural network.
Figure 6.4 Result comparison of classifiers.
Chapter 7
Figure 7.1 Steps in text mining.
Figure 7.2 Stages of preprocessing text.
Figure 7.3 Machine learning based framework for text mining.
Chapter 8
Figure 8.1 Schematic representation of android app.
Chapter 9
Figure 9.1 ACO-CNN deep learning model for sentiment classification and detect...
Chapter 13
Figure 13.1 Steps involved in MRI image processing.
Figure 13.2 Result comparison of classifiers.
Chapter 14
Figure 14.1 Machine learning for software fault prediction.
Figure 14.2 Result comparison.
Chapter 15
Figure 15.1 Pancreatic cancer tissue in CT scan image.
Figure 15.2 Pancreatic cancer detection process.
Figure 15.3 Result comparison of machine learning for pancreatic cancer detect...
Chapter 17
Figure 17.1 Natural language query processing.
Chapter 18
Figure 18.1 Steps involved in web mining.
Chapter 21
Figure 21.1 Web data mining process.
Figure 21.2 Taxonomy of web data mining.
Chapter 23
Figure 23.1 CC Actors.
Chapter 24
Figure 24.1 Hierarchical structure of proposed approach based on AHP.
Figure 24.2 Comparison of severity level of vulnerabilities according to the p...
Chapter 25
Figure 25.1 Phases of zero day exploits.
Figure 25.2 System flow for training and classification of cloud network traff...
Figure 25.3 Adaptive Predictive Ensemble Machine Learning (APEML) system.
Figure 25.4 Comparative analysis of accuracy for ML algorithms for input datas...
Figure 25.5 Area under ROC curve for classifier 1 to 6.
Figure 25.6 Principal component analysis for classifier 1 to 6.
Figure 25.7 Top 10 zero day mitigation strategies.
Chapter 26
Figure 26.1 The proposed TTRD framework.
Chapter 27
Figure 27.1 Blockchain AI market size.
Figure 27.2 NLP and blockchain [13].
Figure 27.3 Various blockchain solutions.
Figure 27.4 Implications of blockchain on NLP.
Chapter 28
Figure 28.1 Different NLP approaches.
Chapter 29
Figure 29.1 Framework to detect Alzheimer’s disease using machine learning tec...
Chapter 30
Figure 30.1 Crime increase rate at marine territory [23, 28].
Figure 30.2 Crime increase rate at marine territory (2022).
Figure 30.3 Crime increase rate at marine territory (2024).
Figure 30.4 Increased marine activities.
Cover Page
Table of Contents
Front Page
Title Page
Copyright Page
Preface
Begin Reading
About the Editors
Index
Also of Interest
WILEY END USER LICENSE AGREEMENT
ii
iii
iv
xvii
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
Scrivener Publishing100 Cummings Center, Suite 541JBeverly, MA 01915-6106
Publishers at ScrivenerMartin Scrivener ([email protected])Phillip Carmical ([email protected])
Edited by
Rajesh Kumar Chakrawarti
Ranjana Sikarwar
Sanjaya Kumar Sarangi
Samson Arun Raj Albert Raj
Shweta Gupta
Krishnan Sakthidasan Sankaran
and
Romil Rawat
This edition first published 2025 by John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA and Scrivener Publishing LLC, 100 Cummings Center, Suite 541J, Beverly, MA 01915, USA© 2025 Scrivener Publishing LLCFor more information about Scrivener publications please visit www.scrivenerpublishing.com.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.
Wiley Global Headquarters111 River Street, Hoboken, NJ 07030, USA
For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.
Limit of Liability/Disclaimer of WarrantyWhile the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials, or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read.
Library of Congress Cataloging-in-Publication Data
ISBN 9781394272433
Front cover images supplied by Adobe FireflyCover design by Russell Richardson
The book’s goal is to discuss the most current trends in applying natural language processing (NLP) approaches. It makes the case that these areas will continue to develop and merit contributions.
The book focusses on software development that is based on visual modelling, is object-orientated, and is one of the most significant development paradigms today. To reduce issues throughout the documentation process, there are still a few considerations to make. To assist developers in their documentation tasks, a few aids have been developed. To aid with the documentation process, a variety of related tools (such as assistants) may be made using natural language processing (NLP). The book is focused on software development and operation using data mining, informatics, big data analytics, artificial intelligence (AI), machine learning (ML), digital image processing, the Internet of Things (IoT), cloud computing, computer vision, cyber security, Industry 4.0, and health informatics domains.
Ravindra Bhardwaj1*, Sreenivasulu Gogula2, Bidisha Bhabani3, K. Kanagalakshmi4, Aparajita Mukherjee5 and D. Vetrithangam6
1Deparment of Physics and Computer Science, Dayalbagh Educational Institute (Deemed to be University), Agra, Uttar Pradesh, India
2Department of CSE (Data Science), Vardhaman College of Engineering, Shamshabad, Hyderabad, India
3Department of Computer Science and Engineering, University of Engineering and Management (UEM), New Town, West Bengal, India
4Department of Computer Applications, SRM Institute of Science and Technology (Deemed to be University), Trichy, India
5Department of Computer Science and Engineering, Institute of Engineering and Management, University of Engineering and Management (UEM), New Town, Kolkata, West Bengal, India
6Department of Computer Science & Engineering University, Institute of Engineering, Chandigarh University, Mohali, Punjab, India
The Internet of Things (IoT) refers to the increasing connectivity of many human-made entities, such as healthcare systems, smart homes, and smart grids, through the internet. Currently, a vast amount of material and expertise has been widely spread. These networks give rise to several security threats and privacy concerns. Intrusions refer to malevolent and unlawful actions that cause harm to the network. IoT networks are susceptible to a diverse range of security issues due to their widespread presence. Cyber attacks on the IoT architecture can lead to the loss of information or data, as well as the sluggishness of IoT devices. For the past twenty years, an Intrusion Detection System has been utilized to ensure the security of data and networks. Conventional intrusion detection technologies are ineffective in detecting security breaches in the Internet of Things (IoT) because of the distinct standards and protocol stacks used in its network. Regularly analyzing the vast amount of data created by IoT is a tough task due to its endless nature. An intrusion detection system (IDS) is employed to safeguard a system or network against unauthorized access by actively monitoring and identifying any potentially malicious or suspicious activities. Machine learning technologies provide robust and efficient approaches for mitigating these distinct hazards. The establishment of a robust machine learning system is the key to acquiring networks that are free from any form of threats.
Keywords: Machine learning, Internet of Things, security, privacy, attacks, vulnerability, intrusions
The use of connected devices made ordinary chores easier and more efficient. They also provide a lot of information that is of great use. Connected automobiles, for example, may be able to take use of services that provide driver assistance. Medical devices give detailed patient records. The unfortunate reality is that a digital assault is possible on any device that is capable of establishing a connection to the internet. In worst case, many of these devices are missing even the most basic safety safeguards. According to the authors of the report, almost all of the data flow associated with the internet of things (98%) is not secured. This information may be obtained by anybody with little effort. To repeat, devices that are connected to the Internet of Things provide fraudsters with an easy target. Not only might their information be stolen, but perhaps other sensitive data as well. Using one of these devices is a frequent strategy used by hackers to gain access to a company’s internal network. The sheer number of these devices and the settings they control may be enough to pique the interest of a cyber-attacker [1] as given in Figure 1.1: Increasing Number of DDOS Attacks [Source: Cisco Annual Internet Report 2018-2023] and in Figure 1.2: Threats to Internet of Things.
In a smart environment, any number of items, including databases of user credentials, electronic sensors, CCTV installations, access controls, personal electronic devices, recorded biometrics, and so on, might be the target of an attack. It is essential to protect the confidentiality, integrity, availability, authentication, and authorization features of the IoT architecture from a security point of view [2]. DDoS attacks are becoming more common, and Cisco’s Annual Internet Report (2018-2023) White Paper forecasts that the total number of DDoS attacks would more than double from the 7.9 million that were seen in 2018 to anywhere over 15 million by 2023 as shown in Figure 1.1.
Figure 1.1 Increasing number of DDOS attacks [Source: Cisco Annual Internet Report 2018-2023].
Figure 1.2 Threats to Internet of Things.
According to the survey, 57% of IoT devices that are connected via this insecure traffic are susceptible to medium- to high-severity attacks, making them an easy target for cybercriminals [3]. In addition, the survey found that 41% of attacks target IoT vulnerabilities by scanning them against publicly available databases of known security flaws. The analysis is shown in Figure 1.2.
According to the Internet of Things Threat Report published by Palo Alto Networks in March 2020, 98% of all traffic from IoT devices is unencrypted, giving attackers a chance to eavesdrop. This network contains sensitive and private information that is easily accessible to attackers, who may then sell the information on the dark web for a profit.
Vulnerabilities in IoT network are increasing every year. As shown in Figure 1.3, IoT environment is experiencing, a large number of new vulnerabilities every year. All the Internet of Things applications—smart city, smart farming, smart healthcare, smart transportation, and smart traffic—are experiencing new vulnerabilities and increasing number of attacks every year. Also, vulnerabilities and attacks are increasing every year. Number of vulnerabilities has increased threefold in the last decade and twofold in last five years as represented in Figure 1.3: Number of New Vulnerabilities Identified in IOT [Source- IBM X-Force Threat Intelligence Index 2022].
Figure 1.3 Number of new vulnerabilities identified in IoT [Source- IBM X-Force Threat Intelligence Index 2022].
The process of determining how vulnerable a system is to attack is referred to as a vulnerability scan. This kind of scan is carried out to identify potential entry points into a computer or network so that appropriate preventative measures may be taken. Automated scanning methods check applications to see if they have any security problems to establish whether or not there are vulnerabilities in an organization’s internal network. Users are spared the time and effort required to carry out hundreds or even thousands of manual tests for each kind of vulnerability since vulnerability scanners automate the process of searching for security issues in a system.
To maintain the integrity of the system’s protections, it is essential to assign vulnerabilities a severity ranking before putting into action any remedial procedures. Common Vulnerability Scoring System (CVSS) is a tool that administrators may use to prioritize security problems according to the severity level associated with each fault. The CVSS score of vulnerability is a standard metric that is not developed for unique network architecture. Despite the fact that the frequency and impact of vulnerabilities affect the security risk level of a specific network, the CVSS score of vulnerability is a standard metric. In addition to the severity score, a number of other factors also affect the level of security risk that is posed by the organization’s underlying infrastructure. These factors include the age and frequency of vulnerabilities already present in the system, as well as the impact that exploiting vulnerability has on the system. For this reason, it is advised that, when doing risk level calculations, these components, together with the CVSS severity score, be used. This will allow for effective network security risk management.
The authors of [4] provide a code inspection-based strategy. To identify a number of mistakes hidden inside the process, this method makes use of code inspection. It is said that the offered approach may be used to locate each and every vulnerability in the NVD. Using this classifier might assist in more accurately identifying potential security flaws.
In addition, a web crawler was developed by Guojun and his colleagues [5]. This web spider collects papers that are connected to one another. The TF-IDF is essential to the methodology. Medeiros et al. [6] were the ones who first proposed the approach for evaluating the quality of the code. The concepts that underlie data mining are built on this methodology, which acts as the basis for those concepts. New techniques for identifying web server vulnerabilities were developed by [7].
Authors [8] have developed an innovative method for locating vulnerabilities in web applications. In addition to this, static analysis and data mining directly from the source code are used. Researchers [9] came to the conclusion that XML injection is a critical issue that exists in all web applications. The vast majority of recently published web apps continue to be plagued by XML injection difficulties.
According to research by [10], a large percentage of such norms rely on online application security. Security measures designed to prevent code injection attacks on web applications were the primary focus of these studies. But even if the notion of acceptance is clearly defined and extensively concealed in almost all international standard regulations, the number of assaults is rising because of flaws in the infusion of code. This is the opinion of the developers. To reduce safety gauges, it is crucial to inform engineers and clients about the relevance of these metrics and to urge them to fulfil the standards with meticulous care. The time we waste waiting for this type of instruction and support is just not acceptable.
Authors [11] spoke about the significant factors that are engaged in the life cycle of product innovation. In addition, a number of software engineers have introduced security mechanization tools and processes that can be used at any stage of the software development life cycle (SDLC) to enhance the stability and quality of even the most fundamental digital systems. In addition to this, they requested that all organizations working to improve networks place a higher priority on planning, education, risk assessment, threat modelling, audits of architecture configuration, secure coding, and assessments of data that has been sent and received after it has been processed.
Wang and Reiter [12] developed a method for mitigating denial of service attacks by making use of a website’s diagrammatic structure to counter flooding assaults. When visiting the destination website, a valid customer has the opportunity to quickly get a reward URL by clicking on a referral link provided by a reputable source. The proposed paradigm has no requirements in terms of infrastructure, and it does not call for any changes to be made to the code that users use when they access websites. The WRAPS framework, in addition to the intentions that its creator had for it, was provided. Nearly all of the smart assaults on websites recycled old strategies and methods from earlier attacks. There is a wide number of guises under which one may launch an assault against a strategy or an approach. They may also be seen in circumstances that are not related to the web. Attacks on a website’s business logic may be harmful to the website itself, but attackers can also utilize websites as a go-between to accomplish their goals.
The SQLProb [13] will remove the user input and check to see whether it complies with the syntactic requirements of the query. This is accomplished by applying the formula that was inherited and then improving it. The SQLProb is a comprehensive discovery approach that does not need any modifications to be made to either the application or the database. This allows it to avoid the complexity of polluting, learning, and instrumenting code. In addition, neither education nor metadata are required in order to go on with the material’s approval procedure.
Authors presented a complete stream-based WS-security handling architecture in their paper [14]. This design improves the level of preparedness in the administration processing and raises the level of resistance to different kinds of DoS assaults. When leaking is used as a strategy, their engine is able to handle standard WS-Security application scenarios.
The author [15] has examined the vast majority of the conventional criteria that are used to judge Web service quality. The majority of the measures, including performance, consistency, adaptability, limit, strength, exception handling, correctness, uprightness, openness, accessibility, interoperability, and security, all fall below the average level.
Hoquea et al. [16] took into consideration the activities that may be taken as well as the probable results or degrees of harm. Following that, the designer divides the assaults into a number of distinct categories. They consistently offered a scientific classification of attack equipment to assist in the organization of security specialists. This was done to help in the prevention of potential threats. They delivered a detailed and well-organized examination of existing tools and frameworks that may aid attackers as well as system defenders. Their focus was on tools and frameworks that are available now. The writers have included a description of both the benefits and drawbacks of the tools and frameworks in the event that you are interested in learning more about them.
Binbin Qu et al. [17] provided an explanation of the method that lies behind a model design. The construction of a pollutant dependency diagram for the program requires many steps, one of which is a static examination of the program’s source code. They employ a limited state automaton to adhere to the attack model while communicating the pollutant string estimate and verifying the robustness of the program’s protections for user input. All of this takes place while maintaining the integrity of the attack model. They utilized the framework model for computerized recognition based on the examination of the spoils and placed it into operation.
An incursion refers to any malevolent or dubious activity that jeopardizes the security of a computer or network. Intruders may originate from either internal or external sources. Internal intruders conceal themselves within the targeted network and acquire elevated privileges to deliberately harm the network infrastructure. External intruders surreptitiously extract data from the target network while remaining concealed outside of it. Internal attacks are initiated by nodes that are either malevolent or compromised, whereas external assaults are initiated by entities that are external to the system. An intrusion detection system (IDS) refers to any hardware or software that can identify and alert to potentially malicious activity on a network or computer system. Moreover, it may also be employed to detect any dubious activities or breaches within the system. Typically, when a network or system behaves abnormally, it suggests the occurrence of anything violent, harmful, or illegal. Although the majority of intrusion detection systems (IDS) mostly depend on identifying and reporting anomalies, there are a handful that excel in detecting intrusions that are overlooked by conventional firewalls. In terms of safeguarding the system from harm, intrusion detection systems (IDS) function similarly to firewalls by preventing unauthorized individuals from gaining access.
There are a total of three categories of intrusion detection systems based on the source of data, four groups based on the technique of analysis, and an additional three groups in total.
The Host-Based Intrusion Detection System (HIDS) software is placed on a computer to monitor, evaluate, and gather data on the traffic and suspicious activities of that specific system. In addition, it analyses not just the traffic activity, but also the system calls, file system changes, inter-process communication, and program running on the computer (ZarpelÍo et al., 2017). HIDS utilizes data collected from the operating system and application software to detect suspicious activities. When a host-based intrusion detection system (HIDS) is deployed, it is capable of detecting intrusions solely on the host where it is installed. Installation of HIDS eliminates the need for extra software to identify threats on the system. Intruder detection systems are designed to detect and identify instances of unauthorized access or attacks from within a protected area. The installation cost is substantial due to the requirement of individual Host-based Intrusion Detection Systems (HIDS) for each device as given in Figure 1.4: Host-based IDS.
The Network-Based Intrusion Detection System (NIDS) safeguards network nodes by capturing and scrutinizing all network packets for malicious activities. Figure 1.5 displays the structure of the NIDS. The sensor is strategically positioned in a vulnerable region inside the network, bridging the server and the network. The NIDS monitors both incoming and outgoing communications. If the system identifies any network risks, it will need to respond rigorously in order to safeguard itself. One possible course of action is to prohibit network access from the specified IP address, while another alternative is to inform the responsible party through warning notifications. Determining if the NIDS has noticed their potential intrusions might provide a challenge for a thief. Monitoring extensive networks is under the purview of only a limited number of intrusion detection systems. To mitigate potential security risks, it is imperative to implement scanners, sniffers, and network intrusion detection tools. These measures are necessary to safeguard against various malicious activities such as IP spoofing, DOS assaults, DNS name corruption, man-in-the-middle attacks, and arp cache poisoning. These vulnerabilities arise due to the inherent weaknesses in TCP/IP protocols represented in Figure 1.5 Network-Based Intrusion Detection System.
Figure 1.4 Host-based IDS.
Figure 1.5 Network-based intrusion detection system.
Hybrid Intrusion Detection Systems (HIDS) integrate the functionalities of several intrusion detection systems to identify and expose intrusions. A hybrid intrusion detection system integrates data from both the network and the host agent or system to create a full overview of the network system. The hybrid technique is the most effective strategy for intrusion detection. Prelude is an example of a hybrid intrusion detection system.
Soft computing makes it possible to build intelligent machines that are able to solve challenging issues that arise in the real world but are beyond the purview of standard mathematical modelling. These kinds of problems cannot be adequately modelled using traditional methods. It has a high tolerance for approximate information, ambiguity, imprecision, and merely a partial view of the environment [18], which enables it to emulate the way individuals form their opinions and make decisions. In this section, we will have a brief discussion on the many different techniques to soft computing that may be used in the process of detecting intrusions.
The genetic algorithm (GA) is a search engine that has been in use since it was conceived in Holland. This search engine is both strong and adaptable. There it first emerged in its current shape for the first time. Because of advances in technology, it is now possible to recreate the natural process of evolution that takes place in uncontrolled environments. The GA may be seen in this way as an example of a global search process that depends on randomness. The concept of “survival of the fittest” is applied by the algorithm to the challenge of developing ever more accurate approximations of a solution to the issue.
The most experienced people in the sector are recruited to teach the next generation, which ultimately results in the development of novel solutions to the issue. If this approach is used, the newly recruited staff members could be better able to address the current challenge [19]. The fitness function enables us to get insight into how well people fared on the aspects of the exam that were the most challenging [20].
PSO was first developed in 1995 by [21], who drew their inspiration from the way fish and birds congregate in groups known respectively as flocks and schools. In an effort to discover a solution, a “population” of particles is moved over the damaged region at specified speeds and rotated clockwise and anticlockwise. By employing the stochastic calibration approach and taking into consideration the best preceding and best adjacent locations of the particles, the velocities of the particles may be changed appropriately. A random number generator is what’s needed to get this done.
A kind of logic known as fuzzy logic is one that employs the practice of approximation. The paradigms for optimization and classification used in machine learning are both underpinned by evolutionary computing, which is based on genetic and natural selection-based evolutionary processes. The origin of these evolutionary processes may be traced back to evolution. The majority of the time, genetic algorithms are used [22] in applications that are based on the actual world of business.
In contrast to the conventional naive Bayesian classifier, the HNB may take on a variety of forms depending on the circumstances. Finding the attribute’s hidden parent needs the inclusion of a further layer in the HNB model, which necessitates the addition of this layer. The structures of the HNB components may be inferred with the help of Naive Bayes. Each characteristic has a hidden past that was fostered to bring together the many energy that it symbolizes. For the purpose of providing an overview of the covert parents, we may make use of the mean of weighted one-dependency estimators [23, 24].
The support vector machine, sometimes known as an SVM [25, 26] for short, is a technique to classification that is grounded on statistical learning theory (SLT) [27–29]. Another kind of system that is comparable is known as a hyper-plane classifier. In support vector machines (SVM), a good hyper-plane is one that successfully separates the classes while keeping the amount of interclass overlap to a bare minimum.
Deep neural networks, more often referred to as DBNs, are generative graph models that are used in machine learning. These networks are built on latent variables, which are also referred to as hidden units. These networks simply link the levels themselves, and not the units that are included inside those levels.
We may look at the model that was built by researchers and published in [24] as an illustration of one method that can be used to determine attributes for an intrusion detection system.
The issue of safety is of utmost importance in the context of IoT and other types of pervasive connectivity. There is a growing probability that attacks would focus on companies and organizations that utilize IoT. Traditional cybersecurity systems face multiple obstacles when attempting to detect zero-day threats. The invader exploits the privileges offered by the IoT architecture to acquire valuable data. There are few security risks that are widely recognized, and even fewer that involve slow and unnoticed attacks. An effective strategy to tackle these unexpected challenges is to construct intrusion detection systems using machine learning techniques. Cyberattacks on the Internet of Things architecture may result in data loss or information loss, as well as IoT device sluggishness. To guarantee the security of data and networks, intrusion detection systems have been in use for the last 20 years. Because the Internet of Things (IoT) uses unique standards and protocol stacks, traditional intrusion detection methods are not successful in identifying security breaches in its network. Because the amount of data generated by IoT is infinite, it is difficult to regularly analyze it. A system or network is protected from unauthorized access by an intrusion detection system (IDS), which actively monitors and detects any potentially harmful or suspicious activity. Machine learning technologies offer reliable and effective methods for reducing these specific risks.
1. Raghuvanshi, A., Singh, U.K.
et al.
, Intrusion Detection Using Machine Learning for Risk Mitigation in IoT-Enabled Smart Irrigation in Smart Farming.
J. Food Qual.
, 2022, 1, 1–8, 2022.
2. Abhishek, R., Singh, U.K., Phasinam, K., Kassanuk, T., Internet of Things-Security Vulnerabilities and Countermeasures.
ECS Trans.
, 107, 1, 15043–15053, 2022.
3. Raghuvanshi, A., Singh, U.K., Joshi, C., A Review of Various Security and Privacy Innovations for IoT Applications in Healthcare.
Adv. Healthcare Syst.
, 1, 43–58, 2022, doi: 10.1002/9781119769293.ch4.
4. Zhang, Q. and Wang, X., SQL injections through back-end of RFID system, in:
2009 International Symposium on Computer Network and Multimedia Technology. CNMT 2009
, pp. 1–4, IEEE, 2009.
5. Li, Z.
et al.
, VulPecker: an automated vulnerability detection system based on code similarity analysis.
ACM, Proc. of the 32 Annual Conference on Computer Security Applications
, p. 201213, 2016.
6. Guojun, Z.
et al.
, Design and application of intelligent dynamic crawler for web data mining, in:
Automation (YAC), 2017 32nd Youth Academic Annual Conference of Chinese Association
, pp. 1098–1105, IEEE, 2017.
7. Medeiros, I., Neves, N., Correia, M., Detecting and removing web application vulnerabilities with static analysis and data mining.
IEEE Trans. Reliab.
, 1, 54–69, IEEE, 2016.
8. Masood, A. and Java, J., Static Analysis for Web Service Security – Tools & Techniques for a Secure Development Life Cycle.
International Symposium on Technologies for Homeland Security
, pp. 1–6, 2015.
9. Medeiros, I. and Neves, N., Detecting and Removing Web Application Vulnerabilities with Static Analysis and Data Mining.
IEEE Trans. Reliab.
, 1, 1–16, 2015.
10. Salas, M.I., de Geus, P.L., Martins, E., Security Testing Methodology for Evaluation of Web Services Robustness - Case: XMLInjection.
IEEE World Congress on Services
, pp. 303–310, 2015.
11. Madan, S., Security Standards Perspective to Fortify Web Database Applications from Code Injection Attacks.
International Conference on Intelligent Systems, Modelling and Simulation
, pp. 226–233, 2010.
12. Teodoro, N. and Serrao, C., Web application security: Improving critical web - based applications quality through in - depth security analysis, in:
International Conference on Information Society (i- Society)
, pp. 457–462, 2011.
13. Wang, X. and Reiter, M.K., Using Web-Referral Architectures to Mitigate Denial-of-Service Threats.
J. IEEE Trans. Dependable Secure Comput.
, 7, 2, 203–216, 2010.
14. Liu, A., Yuan, Y., Wijesekera, D., Stavrou, A., SQLProb: a proxy-based architecture towards preventing SQL injection attacks, in:
Proceedings ACM Symposium on Applied Computing (SAC’09)
, pp. 2054–2061, 2009.
15. Gruschka, N., Jensen, M., Lo Iacono, L., Luttenberger, Server-side Streaming Processing of WS-Security.
IEEE Trans. Serv. Comput.
, 4, 4, 272–285, 2011.
16. Ladan, M.I., Web Services Metrics: A Survey and A Classification.
J. Commun. Comput.
, 9, 7, 824–829, 2012.
17. Hoque, N., Bhuyan, M.H., Baishya, R.C., Bhattacharyya, D.K., Kalita, Network Attacks: Taxonomy, tools and systems.
J. Comput. Netw. Appl.
, 1, 13–26, 4 October 2013, doi:
doi.org/10.1016/j.jnca.2013.08.001
.
18. Kulshestha, G., Agarwal, A., Mittal, A., Sahoo, A., Hybrid Cuckoo Search Algorithm for Simultaneous Feature and Classifier Selection.
IEEE International Conference on Cognitive Computing and Information Processing (CCIP)
, pp. 1–6, 2015.
19. Visumathi, J. and Shunmuganathan, K.L., A computational intelligence for evaluation of intrusion detection system.
Indian J. Sci. Technol.
, 4, 1, 28–34, Jan 2011.
20. Wang, B., Yao, X., Jiang, Y., Sun, C., Shabaz, M., Design of a Real-Time Monitoring System for Smoke and Dust in Thermal Power Plants Based on Improved Genetic Algorithm.
J. Healthc. Eng
, 2021, D. Singh (Ed.), pp. 1–10, Hindawi Limited, UAE, 2021,
https://doi.org/10.1155/2021/7212567
.
21. Mohanasundaram, S., Ramirez-Asis, E., Quispe-Talla, A., Bhatt, M.W., Shabaz, M., Experimental replacement of hops by mango in beer: production and comparison of total phenolics, flavonoids, minerals, carbohydrates, proteins and toxic substances,
Int. J. Syst. Assur. Eng. Manage.
, Springer Science and Business Media LLC, UAE, 2021,
https://doi.org/10.1007/s13198-021-01308-3
.
22. Almahirah, M.S., S, V.N., Jahan, M., Sharma, S., Kumar, S., Role of Market Microstructure in Maintaining Economic Development.
Empirical Econ. Lett.
, 20, 2, 01–14, 2021.
23. Chaudhary, A., Tiwari, V.N., Kumar, A., Analysis of Fuzzy Logic Based Intrusion Detection Systems in Mobile Ad Hoc Networks.
Int. J. Inf. Technol.
, 6, 1, 183–198, June 2014.
24. Rathore, N. and Rajavat, A., Smart Farming Based on IOT-Edge Computing: Applying Machine Learning Models For Disease And Irrigation Water Requirement Prediction In Potato Crop Using Containerized Microservices, in:
Precision Agriculture for Sustainability
, pp. 399–424, Apple Academic Press, UAE, 2024.
25. Patsariya, M. and Rajavat, A., A Progressive Design of MANET Security Protocol for Reliable and Secure Communication.
Int. J. Intell. Syst. Appl. Eng.
,
12
, 9s, 190–204, 2024.
26. Rathi, M. and Rajavat, A., Investigations and Design of Privacy-Preserving Data Mining Technique for Secure Data Publishing.
Int. J. Intell. Syst. Appl. Eng.
,
11
, 9s, 351–367, 2023.
27. Dubey, P. and Rajavat, A., Effective K-means clustering algorithm for efficient data mining, in:
2023 2nd International Conference on Vision Towards Emerging Trends in Communication and Networking Technologies (ViTECoN)
, pp. 1–6, IEEE, 2023, May.
28. Nahar, S., Pithawa, D., Bhardwaj, V., Rawat, R., Rawat, A., Pachlasiya, K., Quantum Technology for Military Applications.
Quantum Comput. Cybersecur.
, 1, 313–334, 2023.
29. Pithawa, D., Nahar, S., Bhardwaj, V., Rawat, R., Dronawat, R., Rawat, A., Quantum Computing Technological Design Along with Its Dark Side.
Quantum Comput. Cybersecur.
, 1, 295–312, 2023.
*
Corresponding author
:
R. Deepika1*, Sreenivasulu Gogula2, K. Kanagalakshmi3, Anshu Mehta4, S. J. Vivekanandan5 and D. Vetrithangam6
1Department of AI&DS, B V Raju Institute of Technology, Narsapur, Telangana, India
2Department of CSE (Data Science), Vardhaman College of Engineering, Shamshabad, Hyderabad, India
3Department of Computer Applications, SRM Institute of Science and Technology (Deemed to be University), Trichy, India
4Department of Computer Science and Engineering, Chandigarh University, Mohali, Punjab, India
5Department of Computer Science and Engineering, Dhanalakshmi College of Engineering, Dr V P R Nagar, Manimangalam, Tambaram, Chennai, India
6Department of Computer Science & Engineering, University Institute of Engineering, Chandigarh University, Mohali, Punjab, India
Frequent pattern mining is a very active topic in the field of data mining. Numerous researchers have considered it since its beginning. The dimensions of all areas expand exponentially with the advancement and accumulation of data. The ability to effectively and easily assess and extract time-sensitive information from large datasets is essential for making informed decisions and uncovering new knowledge. Data mining is the use of sophisticated analytics on large databases to discover previously unidentified links, patterns, and trends. Efficient and adaptable handling of large-scale data is crucial for retrieving information and making informed decisions. Data mining is the systematic analysis of vast quantities of data to uncover previously undiscovered correlations, patterns, and trends. Since the inception of the World Wide Web, there has been a rapid and significant increase in the quantity of data that is stored and can be accessed electronically. Data mining, which refers to the process of extracting new insights from data, has become a crucial tool for both business and academic sectors. With the introduction of the Internet, there has been a rapid increase in the quantity of data stored and available online. Consequently, the methods for extracting valuable information from this extensive collection of data have become crucially significant in several domains, such as business and academics. Frequent Item Set Mining is a very popular technique for getting significant insights from datasets.
Keywords: Frequent pattern mining, decision tree, KNN, accuracy, machine learning
Data mining is the systematic exploration of extensive databases to discover noteworthy and previously unidentified patterns [1]. The step described by Fayyad et al. for Knowledge Discovery in Databases (KDD) is included in this approach. Data cleaning, integration, selection, transformation, mining, pattern assessment, and knowledge representation are all integral components of the continuing process known as Knowledge Discovery in Databases (KDD). Data mining may be used to several types of data. The approaches and procedures may vary when used to different sorts of data. The patterns extracted from data might vary in terms of their nature and the specific sort of data mining task. Data mining jobs may be broadly classified into two categories: descriptive and predictive. Predictive data mining utilizes existing data to generate predictions, whereas descriptive data mining aims to explain the overall characteristics of the provided data.
Bayes’ hypothesis and relapse inquiry were employed in the 1700s to distinguish designs from noise (1800s). Increases in PC innovation have led to a broader variety and higher capacity for information. Hands-on information examination has expanded as the quantity and complexity of informative indexes have grown. There have been a variety of software engineering breakthroughs that have led to this progress, such as the discoveries of neural systems, bunching, hereditary computations (1950s), decision trees (1960s), and support vector machines (1980s) [2].
In a decision tree, each node represents an evaluation of some attribute’s value, and each branch represents the evaluation’s outcome. The tree’s leaves represent classes or distributions of classes. It’s a cinch to convert from decision trees to characterization rules. Decision trees can cope with a lot of information. When it comes to storing data, they use a tree structure that is intuitive and simple to learn. Using a decision tree is a straightforward process that requires just a few easy steps to understand and put together. Decision tree enlistment computations have been used in a wide range of fields, including medical, manufacturing, budgeting, cosmology, and subatomic research [3].
Both AI and data mining heavily rely on tree-based learning approaches. It’s no secret that these strategies have been in use for a long time. There is nothing over the top about them, and that’s exactly what makes them so endearing. When making decision trees, a top-down strategy is often used to identify a univariate split that boosts some local basis (for example, gain percentage) until the leaf segments of the tree are sufficiently pure. Pessimistic Error Pruning uses heuristics that may be measured, while Reduced Error Pruning uses a single set of pruning to determine this utility.
It is a very costly strategy to employ the Naive Bayes classifiers as leaf hubs in all of the first-level child hubs (evaluated by cross-approval), yet this is the only way NB Tree can deliver them in a decision tree. At each node, students analyze additional characteristics as straight, quadratic, or calculated attribute elements, and these elements are then sent down the tree in the same manner that they were processed before. However, despite the fact that root-to-leaf probability dispersions are referred to as disseminations, leaf hubs remain the primary classifiers [4].
This study introduces a recursive Bayesian classifier. One hundred percent accuracy in decision tree enlistment has previously been achieved by a variety of methods, and many of them have been successful. As a result, these new approaches were time-consuming and difficult to learn, and this was the major issue. Recursively dividing the data into places where there is a suspicion of constraining freedom is the most significant aspect. Planning from perceptions of the item to choices based on those perceptions is how judgements about the objective value of anything are made [5].
Determining whether a system is well on its way to attaining its objective is the most prominent usage of decision trees in tasks research. Restrictive probabilities may be calculated using decision trees. A decision tree (also known as a tree outline) is a decision aid that employs a tree-like diagram or model to describe alternatives and their probable outcomes, such as chance event effects, asset expenditures and utility. The decision tree induction approach has been used effectively in master frameworks to gather information. It is possible to use decision trees to enrol people from a variety of data sources [6].
Information mining is an assortment of procedures for proficient computerized disclosure of beforehand obscure, substantial, novel, helpful, and reasonable examples in enormous databases. The examples must be significant with the goal that they might be utilized in an endeavor’s dynamic procedure [7]. Information mining procedures can be gathered as follows as given in Figure 2.1: Data Mining Methods:
Classification-It is necessary to classify the supplied information event into one of the objective classes that have already been identified or defined. One of the models may be whether a customer is a trustworthy client or a defaulter in Visa’s interchange information base, based on his distinct segment and previous purchase characteristics [
8
].
Estimation-Like order, the motivation behind an estimation model is to decide an incentive for an obscure yield trait. In any case, in contrast to grouping, the yield quality for an estimation issue is numeric as opposed to clear cut.
Prediction-It isn’t anything but difficult to separate forecast from grouping or estimation.
Figure 2.1 Data mining methods.
The primary distinction lies in the fact that the predictive model extrapolates results into the foreseeable future rather than providing directives for actions in the here and now. The discontinuous or quantitative nature of the output characteristic can be chosen. One illustration of what a model might entail is making a forecast regarding the value of the Dow Jones Industrial Average at the end of the following week, and explains the history of a decision tree as well as its possible applications in more detail.
Association rule mining-Here interesting hidden rules called affiliation rules in a huge value-based information base is mined out. For example, the standard {milk, margarine >biscuit} gives the data that at whatever point milk and spread are bought together scone is additionally bought, with the end goal that these things can be set together for deals to build the general deals of every one of the things [
9
].