Natural Language Processing for Software Engineering -  - E-Book

Natural Language Processing for Software Engineering E-Book

0,0
188,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

Discover how Natural Language Processing for Software Engineering can transform your understanding of agile development, equipping you with essential tools and insights to enhance software quality and responsiveness in today’s rapidly changing technological landscape.

Agile development enhances business responsiveness through continuous software delivery, emphasizing iterative methodologies that produce incremental, usable software. Working software is the main measure of progress, and ongoing customer collaboration is essential. Approaches like Scrum, eXtreme Programming (XP), and Crystal share these principles but differ in focus: Scrum reduces documentation, XP improves software quality and adaptability to changing requirements, and Crystal emphasizes people and interactions while retaining key artifacts. Modifying software systems designed with Object-Oriented Analysis and Design can be costly and time-consuming in rapidly changing environments requiring frequent updates. This book explores how natural language processing can enhance agile methodologies, particularly in requirements engineering. It introduces tools that help developers create, organize, and update documentation throughout the agile project process.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 900

Veröffentlichungsjahr: 2025

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

Table of Contents

Series Page

Title Page

Copyright Page

Preface

1 Machine Learning and Artificial Intelligence for Detecting Cyber Security Threats in IoT Environmment

1.1 Introduction

1.2 Need of Vulnerability Identification

1.3 Vulnerabilities in IoT Web Applications

1.4 Intrusion Detection System

1.5 Machine Learning in Intrusion Detection System

1.6 Conclusion

References

2 Frequent Pattern Mining Using Artificial Intelligence and Machine Learning

2.1 Introduction

2.2 Data Mining Functions

2.3 Related Work

2.4 Machine Learning for Frequent Pattern Mining

2.5 Conclusion

References

3 Classification and Detection of Prostate Cancer Using Machine Learning Techniques

3.1 Introduction

3.2 Literature Survey

3.3 Machine Learning for Prostate Cancer Classification and Detection

3.4 Conclusion

References

4 NLP-Based Spellchecker and Grammar Checker for Indic Languages

4.1 Introduction

4.2 NLP-Based Techniques of Spellcheckers and Grammar Checkers

4.3 Grammar Checker Related Work

4.4 Spellchecker Related Work

4.5 Conclusion

References

5 Identification of Gujarati Ghazal Chanda with Cross-Platform Application

Abbreviations

5.1 Introduction

5.2 Ghazal

5.3 History and Grammar of Ghazal

5.4 Literature Review

5.5 Proposed System

5.6 Conclusion

References

6 Cancer Classification and Detection Using Machine Learning Techniques

6.1 Introduction

6.2 Machine Learning Techniques

6.3 Review of Machine Learning for Cancer Detection

6.4 Methods

6.5 Result Analysis

6.6 Conclusion

References

7 Text Mining Techniques and Natural Language Processing

7.1 Introduction

7.2 Text Classification and Text Clustering

7.3 Related Work

7.4 Methodology

7.5 Conclusion

References

8 An Investigation of Techniques to Encounter Security Issues Related to Mobile Applications

8.1 Introduction

8.2 Literature Review

8.3 Results and Discussions

8.4 Conclusion

References

9 Machine Learning for Sentiment Analysis Using Social Media Scrapped Data

9.1 Introduction

9.2 Twitter Sentiment Analysis

9.3 Sentiment Analysis Using Machine Learning Techniques

9.4 Conclusion

References

10 Opinion Mining Using Classification Techniques on Electronic Media Data

10.1 Introduction

10.2 Opinion Mining

10.3 Related Work

10.4 Opinion Mining Techniques

10.5 Conclusion

References

11 Spam Content Filtering in Online Social Networks

11.1 Introduction

11.2 E-Mail Spam Identification Methods

11.3 Online Social Network Spam

11.4 Related Work

11.5 Challenges in the Spam Message Identification

11.6 Spam Classification with SVM Filter

11.7 Conclusion

References

12 An Investigation of Various Techniques to Improve Cyber Security

12.1 Introduction

12.2 Various Attacks [6–9]

12.3 Methods

12.4 Conclusion

References

13 Brain Tumor Classification and Detection Using Machine Learning by Analyzing MRI Images

13.1 Introduction

13.2 Literature Survey

13.3 Methods

13.4 Result Analysis

13.5 Conclusion

References

14 Optimized Machine Learning Techniques for Software Fault Prediction

14.1 Introduction

14.2 Literature Survey

14.3 Methods

14.4 Result Analysis

14.5 Conclusion

References

15 Pancreatic Cancer Detection Using Machine Learning and Image Processing

15.1 Introduction

15.2 Literature Survey

15.3 Methodology

15.4 Result Analysis

15.5 Conclusion

References

16 An Investigation of Various Text Mining Techniques

16.1 Introduction

16.2 Related Work

16.3 Classification Techniques for Text Mining

16.4 Conclusion

References

17 Automated Query Processing Using Natural Language Processing

17.1 Introduction

17.2 The Challenges of NLP

17.3 Related Work

17.4 Natural Language Interfaces Systems

17.5 Conclusion

References

18 Data Mining Techniques for Web Usage Mining

18.1 Introduction

18.2 Web Mining

18.3 Web Usage Data Mining Techniques

18.4 Conclusion

References

19 Natural Language Processing Using Soft Computing

19.1 Introduction

19.2 Related Work

19.3 NLP Soft Computing Approaches

19.4 Conclusion

References

20 Sentiment Analysis Using Natural Language Processing

20.1 Introduction

20.2 Sentiment Analysis Levels

20.3 Challenges in Sentiment Analysis

20.4 Related Work

20.5 Machine Learning Techniques for Sentiment Analysis

20.6 Conclusion

References

21 Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data

21.1 Introduction

21.2 Web Mining

21.3 Taxonomy of Web Data Mining

21.4 Web Content Mining Methods

21.5 Efficient Algorithms for Web Data Extraction

21.6 Machine Learning Based Web Content Extraction Methods

21.7 Conclusion

References

22 Intelligent Pattern Discovery Using Web Data Mining

22.1 Introduction

22.2 Pattern Discovery from Web Server Logs

22.3 Data Mining Techniques for Web Server Log Analysis

22.4 Graph Theory Techniques for Analysis of Web Server Logs

22.5 Conclusion

References

23 A Review of Security Features in Prominent Cloud Service Providers

23.1 Introduction

23.2 Cloud Computing Overview

23.3 Cloud Computing Model

23.4 Challenges with Cloud Security and Potential Solutions

23.5 Comparative Analysis

23.6 Conclusion

References

24 Prioritization of Security Vulnerabilities under Cloud Infrastructure Using AHP

24.1 Introduction

24.2 Related Work

24.3 Proposed Method

24.4 Result and Discussion

24.5 Conclusion

References

25 Cloud Computing Security Through Detection & Mitigation of Zero-Day Attack Using Machine Learning Techniques

25.1 Introduction

25.2 Related Work

25.3 Proposed Methodology

25.4 Results and Discussion

25.5 Conclusion and Future Work

References

26 Predicting Rumors Spread Using Textual and Social Context in Propagation Graph with Graph Neural Network

26.1 Introduction

26.2 Literature Review

26.3 Proposed Methodology

26.4 Results and Discussion

26.5 Conclusion

References

27 Implications, Opportunities, and Challenges of Blockchain in Natural Language Processing

27.1 Introduction

27.2 Related Work

27.3 Overview on Blockchain Technology and NLP

27.4 Integration of Blockchain into NLP

27.5 Applications of Blockchain in NLP

27.6 Blockchain Solutions for NLP

27.7 Implications of Blockchain Development Solutions in NLP

27.8 Sectors That can be Benified from Blockchain and NLP Integration

27.9 Challenges

27.10 Conclusion

References

28 Emotion Detection Using Natural Language Processing by Text Classification

28.1 Introduction

28.2 Natural Language Processing

28.3 Emotion Recognition

28.4 Related Work

28.5 Machine Learning Techniques for Emotion Detection

28.6 Conclusion

References

29 Alzheimer Disease Detection Using Machine Learning Techniques

29.1 Introduction

29.2 Machine Learning Techniques to Detect Alzheimer’s Disease

29.3 Pre-Processing Techniques for Alzheimer’s Disease Detection

29.4 Feature Extraction Techniques for Alzheimer’s Disease Detection

29.5 Feature Selection Techniques for Diagnosis of Alzheimer’s Disease

29.6 Machine Learning Models Used for Alzheimer’s Disease Detection

29.7 Conclusion

References

30 Netnographic Literature Review and Research Methodology for Maritime Business and Potential Cyber Threats

30.1 Introduction

30.2 Criminal Flows Framework

30.3 Oceanic Crime Exchange and Categorization

30.4 Fisheries Crimes and Mobility Crimes

30.5 Conclusion

30.6 Discussion

References

31 Review of Research Methodology and IT for Business and Threat Management

Abbreviation Used

31.1 Introduction

31.2 Conclusion

References

About the Editors

Index

Also of Interest

End User License Agreement

List of Tables

Chapter 5

Table 5.1 Understanding Matra.

Table 5.2 Understanding Matra with a poetic line.

Chapter 23

Table 23.1 Attack breaches on cloud providers.

Table 23.2 Features provided by service model.

Table 23.3 Various security issues in cloud.

Table 23.4 Various Security measures used by major cloud providers.

Chapter 24

Table 24.1 Primary & proposed security criteria.

Table 24.2 Scale of relative importance.

Table 24.3 Pairwise comparison matrix for security criteria.

Table 24.4 Random index (RI).

Table 24.5 Security criteria for AHP.

Table 24.6 Pairwise comparison matrix for security criteria.

Table 24.7 Normalized pairwise comparison matrix.

Table 24.8 Weighted priority matrix for security criteria.

Table 24.9 Comparison of results of proposed and previous method.

Chapter 25

Table 25.1 Specification of Data Sets used and their respective threats/risk.

Table 25.2 Confusion matrix of all six datasets with six algorithms/classifier...

Table 25.3 Detection of top 12 attacks/risk using proposed methodology

Table 25.4 Comparison of performance of proposed model & previous methods/mode...

Chapter 26

Table 26.1 Dataset and graph statistics.

Table 26.2 Performance evaluation.

Table 26.3 Rumor detection model performance with different node features mode...

Chapter 27

Table 27.1 Various research work on NLP with blockchain.

Chapter 30

Table 30.1 Oceanic crimes categorization [3, 4].

Table 30.2 Increased in crime rate chart [7, 8].

Table 30.3 Generated marine threats [32–34].

Table 30.4 Comparative work analysis.

Chapter 31

Table 31.1 REM framework [3].

Table 31.2 Sampling designs methods [5].

Table 31.3 Data gathering methods [7].

Table 31.4 Data analysis methods [8].

Table 31.5 Literature review classifications [10].

Table 31.6 Datasets [6, 7].

Table 31.7 RM in BM [8].

Table 31.8 Threat management systems [7, 8].

Table 31.9 RM tools [8, 9].

Table 31.10 Visualization tools [9, 10].

List of Illustrations

Chapter 1

Figure 1.1 Increasing number of DDOS attacks [Source: Cisco Annual Internet Re...

Figure 1.2 Threats to Internet of Things.

Figure 1.3 Number of new vulnerabilities identified in IoT [Source- IBM X-Forc...

Figure 1.4 Host-based IDS.

Figure 1.5 Network-based intrusion detection system.

Chapter 2

Figure 2.1 Data mining methods.

Figure 2.2 A sample decision tree—partial view.

Chapter 4

Figure 4.1 Indic language grammar checker research studies found online as of ...

Figure 4.2 Indic language Spellchecker research studies found online as of Mar...

Chapter 5

Figure 5.1 Vowels of the Gujarati language [7].

Figure 5.2 Consonants of the Gujarati language [7].

Figure 5.3 Conjunct consonants of the Gujarati language [7].

Figure 5.4 Numerals of the Gujarati language [7].

Figure 5.5 Sample text in Gujarati.

Figure 5.6 Proposed system.

Figure 5.7 Splash screen.

Figure 5.8 Login screen.

Figure 5.9 Home screen.

Figure 5.10 History screen.

Figure 5.11 Help screen. Type of chanda.

Figure 5.12 Ouput for Khafif Ghazal.

Figure 5.13 Website output for Khafif Ghazal.

Chapter 6

Figure 6.1 A framework for cancer image classification and detection.

Figure 6.2 LSTM network.

Figure 6.3 Convolution neural network.

Figure 6.4 Result comparison of classifiers.

Chapter 7

Figure 7.1 Steps in text mining.

Figure 7.2 Stages of preprocessing text.

Figure 7.3 Machine learning based framework for text mining.

Chapter 8

Figure 8.1 Schematic representation of android app.

Chapter 9

Figure 9.1 ACO-CNN deep learning model for sentiment classification and detect...

Chapter 13

Figure 13.1 Steps involved in MRI image processing.

Figure 13.2 Result comparison of classifiers.

Chapter 14

Figure 14.1 Machine learning for software fault prediction.

Figure 14.2 Result comparison.

Chapter 15

Figure 15.1 Pancreatic cancer tissue in CT scan image.

Figure 15.2 Pancreatic cancer detection process.

Figure 15.3 Result comparison of machine learning for pancreatic cancer detect...

Chapter 17

Figure 17.1 Natural language query processing.

Chapter 18

Figure 18.1 Steps involved in web mining.

Chapter 21

Figure 21.1 Web data mining process.

Figure 21.2 Taxonomy of web data mining.

Chapter 23

Figure 23.1 CC Actors.

Chapter 24

Figure 24.1 Hierarchical structure of proposed approach based on AHP.

Figure 24.2 Comparison of severity level of vulnerabilities according to the p...

Chapter 25

Figure 25.1 Phases of zero day exploits.

Figure 25.2 System flow for training and classification of cloud network traff...

Figure 25.3 Adaptive Predictive Ensemble Machine Learning (APEML) system.

Figure 25.4 Comparative analysis of accuracy for ML algorithms for input datas...

Figure 25.5 Area under ROC curve for classifier 1 to 6.

Figure 25.6 Principal component analysis for classifier 1 to 6.

Figure 25.7 Top 10 zero day mitigation strategies.

Chapter 26

Figure 26.1 The proposed TTRD framework.

Chapter 27

Figure 27.1 Blockchain AI market size.

Figure 27.2 NLP and blockchain [13].

Figure 27.3 Various blockchain solutions.

Figure 27.4 Implications of blockchain on NLP.

Chapter 28

Figure 28.1 Different NLP approaches.

Chapter 29

Figure 29.1 Framework to detect Alzheimer’s disease using machine learning tec...

Chapter 30

Figure 30.1 Crime increase rate at marine territory [23, 28].

Figure 30.2 Crime increase rate at marine territory (2022).

Figure 30.3 Crime increase rate at marine territory (2024).

Figure 30.4 Increased marine activities.

Guide

Cover Page

Table of Contents

Front Page

Title Page

Copyright Page

Preface

Begin Reading

About the Editors

Index

Also of Interest

WILEY END USER LICENSE AGREEMENT

Pages

ii

iii

iv

xvii

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

408

409

410

411

412

413

414

415

416

417

418

419

420

421

422

423

424

425

426

427

428

429

430

431

432

433

434

435

436

437

438

439

440

441

442

443

444

445

446

447

448

449

450

451

452

453

454

455

456

457

458

459

460

461

462

463

464

465

466

467

468

469

470

471

472

473

474

475

476

477

478

479

480

481

482

483

484

485

486

487

488

489

490

491

492

493

494

495

496

497

498

499

500

501

502

503

504

505

506

507

508

509

510

511

512

513

514

515

516

517

518

519

520

521

522

523

524

Scrivener Publishing100 Cummings Center, Suite 541JBeverly, MA 01915-6106

Publishers at ScrivenerMartin Scrivener ([email protected])Phillip Carmical ([email protected])

Natural Language Processing for Software Engineering

Edited by

Rajesh Kumar Chakrawarti

Ranjana Sikarwar

Sanjaya Kumar Sarangi

Samson Arun Raj Albert Raj

Shweta Gupta

Krishnan Sakthidasan Sankaran

and

Romil Rawat

This edition first published 2025 by John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA and Scrivener Publishing LLC, 100 Cummings Center, Suite 541J, Beverly, MA 01915, USA© 2025 Scrivener Publishing LLCFor more information about Scrivener publications please visit www.scrivenerpublishing.com.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.

Wiley Global Headquarters111 River Street, Hoboken, NJ 07030, USA

For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.

Limit of Liability/Disclaimer of WarrantyWhile the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials, or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read.

Library of Congress Cataloging-in-Publication Data

ISBN 9781394272433

Front cover images supplied by Adobe FireflyCover design by Russell Richardson

Preface

The book’s goal is to discuss the most current trends in applying natural language processing (NLP) approaches. It makes the case that these areas will continue to develop and merit contributions.

The book focusses on software development that is based on visual modelling, is object-orientated, and is one of the most significant development paradigms today. To reduce issues throughout the documentation process, there are still a few considerations to make. To assist developers in their documentation tasks, a few aids have been developed. To aid with the documentation process, a variety of related tools (such as assistants) may be made using natural language processing (NLP). The book is focused on software development and operation using data mining, informatics, big data analytics, artificial intelligence (AI), machine learning (ML), digital image processing, the Internet of Things (IoT), cloud computing, computer vision, cyber security, Industry 4.0, and health informatics domains.

1Machine Learning and Artificial Intelligence for Detecting Cyber Security Threats in IoT Environmment

Ravindra Bhardwaj1*, Sreenivasulu Gogula2, Bidisha Bhabani3, K. Kanagalakshmi4, Aparajita Mukherjee5 and D. Vetrithangam6

1Deparment of Physics and Computer Science, Dayalbagh Educational Institute (Deemed to be University), Agra, Uttar Pradesh, India

2Department of CSE (Data Science), Vardhaman College of Engineering, Shamshabad, Hyderabad, India

3Department of Computer Science and Engineering, University of Engineering and Management (UEM), New Town, West Bengal, India

4Department of Computer Applications, SRM Institute of Science and Technology (Deemed to be University), Trichy, India

5Department of Computer Science and Engineering, Institute of Engineering and Management, University of Engineering and Management (UEM), New Town, Kolkata, West Bengal, India

6Department of Computer Science & Engineering University, Institute of Engineering, Chandigarh University, Mohali, Punjab, India

Abstract

The Internet of Things (IoT) refers to the increasing connectivity of many human-made entities, such as healthcare systems, smart homes, and smart grids, through the internet. Currently, a vast amount of material and expertise has been widely spread. These networks give rise to several security threats and privacy concerns. Intrusions refer to malevolent and unlawful actions that cause harm to the network. IoT networks are susceptible to a diverse range of security issues due to their widespread presence. Cyber attacks on the IoT architecture can lead to the loss of information or data, as well as the sluggishness of IoT devices. For the past twenty years, an Intrusion Detection System has been utilized to ensure the security of data and networks. Conventional intrusion detection technologies are ineffective in detecting security breaches in the Internet of Things (IoT) because of the distinct standards and protocol stacks used in its network. Regularly analyzing the vast amount of data created by IoT is a tough task due to its endless nature. An intrusion detection system (IDS) is employed to safeguard a system or network against unauthorized access by actively monitoring and identifying any potentially malicious or suspicious activities. Machine learning technologies provide robust and efficient approaches for mitigating these distinct hazards. The establishment of a robust machine learning system is the key to acquiring networks that are free from any form of threats.

Keywords: Machine learning, Internet of Things, security, privacy, attacks, vulnerability, intrusions

1.1 Introduction

The use of connected devices made ordinary chores easier and more efficient. They also provide a lot of information that is of great use. Connected automobiles, for example, may be able to take use of services that provide driver assistance. Medical devices give detailed patient records. The unfortunate reality is that a digital assault is possible on any device that is capable of establishing a connection to the internet. In worst case, many of these devices are missing even the most basic safety safeguards. According to the authors of the report, almost all of the data flow associated with the internet of things (98%) is not secured. This information may be obtained by anybody with little effort. To repeat, devices that are connected to the Internet of Things provide fraudsters with an easy target. Not only might their information be stolen, but perhaps other sensitive data as well. Using one of these devices is a frequent strategy used by hackers to gain access to a company’s internal network. The sheer number of these devices and the settings they control may be enough to pique the interest of a cyber-attacker [1] as given in Figure 1.1: Increasing Number of DDOS Attacks [Source: Cisco Annual Internet Report 2018-2023] and in Figure 1.2: Threats to Internet of Things.

In a smart environment, any number of items, including databases of user credentials, electronic sensors, CCTV installations, access controls, personal electronic devices, recorded biometrics, and so on, might be the target of an attack. It is essential to protect the confidentiality, integrity, availability, authentication, and authorization features of the IoT architecture from a security point of view [2]. DDoS attacks are becoming more common, and Cisco’s Annual Internet Report (2018-2023) White Paper forecasts that the total number of DDoS attacks would more than double from the 7.9 million that were seen in 2018 to anywhere over 15 million by 2023 as shown in Figure 1.1.

Figure 1.1 Increasing number of DDOS attacks [Source: Cisco Annual Internet Report 2018-2023].

Figure 1.2 Threats to Internet of Things.

According to the survey, 57% of IoT devices that are connected via this insecure traffic are susceptible to medium- to high-severity attacks, making them an easy target for cybercriminals [3]. In addition, the survey found that 41% of attacks target IoT vulnerabilities by scanning them against publicly available databases of known security flaws. The analysis is shown in Figure 1.2.

According to the Internet of Things Threat Report published by Palo Alto Networks in March 2020, 98% of all traffic from IoT devices is unencrypted, giving attackers a chance to eavesdrop. This network contains sensitive and private information that is easily accessible to attackers, who may then sell the information on the dark web for a profit.

1.2 Need of Vulnerability Identification

Vulnerabilities in IoT network are increasing every year. As shown in Figure 1.3, IoT environment is experiencing, a large number of new vulnerabilities every year. All the Internet of Things applications—smart city, smart farming, smart healthcare, smart transportation, and smart traffic—are experiencing new vulnerabilities and increasing number of attacks every year. Also, vulnerabilities and attacks are increasing every year. Number of vulnerabilities has increased threefold in the last decade and twofold in last five years as represented in Figure 1.3: Number of New Vulnerabilities Identified in IOT [Source- IBM X-Force Threat Intelligence Index 2022].

Figure 1.3 Number of new vulnerabilities identified in IoT [Source- IBM X-Force Threat Intelligence Index 2022].

The process of determining how vulnerable a system is to attack is referred to as a vulnerability scan. This kind of scan is carried out to identify potential entry points into a computer or network so that appropriate preventative measures may be taken. Automated scanning methods check applications to see if they have any security problems to establish whether or not there are vulnerabilities in an organization’s internal network. Users are spared the time and effort required to carry out hundreds or even thousands of manual tests for each kind of vulnerability since vulnerability scanners automate the process of searching for security issues in a system.

To maintain the integrity of the system’s protections, it is essential to assign vulnerabilities a severity ranking before putting into action any remedial procedures. Common Vulnerability Scoring System (CVSS) is a tool that administrators may use to prioritize security problems according to the severity level associated with each fault. The CVSS score of vulnerability is a standard metric that is not developed for unique network architecture. Despite the fact that the frequency and impact of vulnerabilities affect the security risk level of a specific network, the CVSS score of vulnerability is a standard metric. In addition to the severity score, a number of other factors also affect the level of security risk that is posed by the organization’s underlying infrastructure. These factors include the age and frequency of vulnerabilities already present in the system, as well as the impact that exploiting vulnerability has on the system. For this reason, it is advised that, when doing risk level calculations, these components, together with the CVSS severity score, be used. This will allow for effective network security risk management.

1.3 Vulnerabilities in IoT Web Applications

The authors of [4] provide a code inspection-based strategy. To identify a number of mistakes hidden inside the process, this method makes use of code inspection. It is said that the offered approach may be used to locate each and every vulnerability in the NVD. Using this classifier might assist in more accurately identifying potential security flaws.

In addition, a web crawler was developed by Guojun and his colleagues [5]. This web spider collects papers that are connected to one another. The TF-IDF is essential to the methodology. Medeiros et al. [6] were the ones who first proposed the approach for evaluating the quality of the code. The concepts that underlie data mining are built on this methodology, which acts as the basis for those concepts. New techniques for identifying web server vulnerabilities were developed by [7].

Authors [8] have developed an innovative method for locating vulnerabilities in web applications. In addition to this, static analysis and data mining directly from the source code are used. Researchers [9] came to the conclusion that XML injection is a critical issue that exists in all web applications. The vast majority of recently published web apps continue to be plagued by XML injection difficulties.

According to research by [10], a large percentage of such norms rely on online application security. Security measures designed to prevent code injection attacks on web applications were the primary focus of these studies. But even if the notion of acceptance is clearly defined and extensively concealed in almost all international standard regulations, the number of assaults is rising because of flaws in the infusion of code. This is the opinion of the developers. To reduce safety gauges, it is crucial to inform engineers and clients about the relevance of these metrics and to urge them to fulfil the standards with meticulous care. The time we waste waiting for this type of instruction and support is just not acceptable.

Authors [11] spoke about the significant factors that are engaged in the life cycle of product innovation. In addition, a number of software engineers have introduced security mechanization tools and processes that can be used at any stage of the software development life cycle (SDLC) to enhance the stability and quality of even the most fundamental digital systems. In addition to this, they requested that all organizations working to improve networks place a higher priority on planning, education, risk assessment, threat modelling, audits of architecture configuration, secure coding, and assessments of data that has been sent and received after it has been processed.

Wang and Reiter [12] developed a method for mitigating denial of service attacks by making use of a website’s diagrammatic structure to counter flooding assaults. When visiting the destination website, a valid customer has the opportunity to quickly get a reward URL by clicking on a referral link provided by a reputable source. The proposed paradigm has no requirements in terms of infrastructure, and it does not call for any changes to be made to the code that users use when they access websites. The WRAPS framework, in addition to the intentions that its creator had for it, was provided. Nearly all of the smart assaults on websites recycled old strategies and methods from earlier attacks. There is a wide number of guises under which one may launch an assault against a strategy or an approach. They may also be seen in circumstances that are not related to the web. Attacks on a website’s business logic may be harmful to the website itself, but attackers can also utilize websites as a go-between to accomplish their goals.

The SQLProb [13] will remove the user input and check to see whether it complies with the syntactic requirements of the query. This is accomplished by applying the formula that was inherited and then improving it. The SQLProb is a comprehensive discovery approach that does not need any modifications to be made to either the application or the database. This allows it to avoid the complexity of polluting, learning, and instrumenting code. In addition, neither education nor metadata are required in order to go on with the material’s approval procedure.

Authors presented a complete stream-based WS-security handling architecture in their paper [14]. This design improves the level of preparedness in the administration processing and raises the level of resistance to different kinds of DoS assaults. When leaking is used as a strategy, their engine is able to handle standard WS-Security application scenarios.

The author [15] has examined the vast majority of the conventional criteria that are used to judge Web service quality. The majority of the measures, including performance, consistency, adaptability, limit, strength, exception handling, correctness, uprightness, openness, accessibility, interoperability, and security, all fall below the average level.

Hoquea et al. [16] took into consideration the activities that may be taken as well as the probable results or degrees of harm. Following that, the designer divides the assaults into a number of distinct categories. They consistently offered a scientific classification of attack equipment to assist in the organization of security specialists. This was done to help in the prevention of potential threats. They delivered a detailed and well-organized examination of existing tools and frameworks that may aid attackers as well as system defenders. Their focus was on tools and frameworks that are available now. The writers have included a description of both the benefits and drawbacks of the tools and frameworks in the event that you are interested in learning more about them.

Binbin Qu et al. [17] provided an explanation of the method that lies behind a model design. The construction of a pollutant dependency diagram for the program requires many steps, one of which is a static examination of the program’s source code. They employ a limited state automaton to adhere to the attack model while communicating the pollutant string estimate and verifying the robustness of the program’s protections for user input. All of this takes place while maintaining the integrity of the attack model. They utilized the framework model for computerized recognition based on the examination of the spoils and placed it into operation.

1.4 Intrusion Detection System

An incursion refers to any malevolent or dubious activity that jeopardizes the security of a computer or network. Intruders may originate from either internal or external sources. Internal intruders conceal themselves within the targeted network and acquire elevated privileges to deliberately harm the network infrastructure. External intruders surreptitiously extract data from the target network while remaining concealed outside of it. Internal attacks are initiated by nodes that are either malevolent or compromised, whereas external assaults are initiated by entities that are external to the system. An intrusion detection system (IDS) refers to any hardware or software that can identify and alert to potentially malicious activity on a network or computer system. Moreover, it may also be employed to detect any dubious activities or breaches within the system. Typically, when a network or system behaves abnormally, it suggests the occurrence of anything violent, harmful, or illegal. Although the majority of intrusion detection systems (IDS) mostly depend on identifying and reporting anomalies, there are a handful that excel in detecting intrusions that are overlooked by conventional firewalls. In terms of safeguarding the system from harm, intrusion detection systems (IDS) function similarly to firewalls by preventing unauthorized individuals from gaining access.

There are a total of three categories of intrusion detection systems based on the source of data, four groups based on the technique of analysis, and an additional three groups in total.

The Host-Based Intrusion Detection System (HIDS) software is placed on a computer to monitor, evaluate, and gather data on the traffic and suspicious activities of that specific system. In addition, it analyses not just the traffic activity, but also the system calls, file system changes, inter-process communication, and program running on the computer (ZarpelÍo et al., 2017). HIDS utilizes data collected from the operating system and application software to detect suspicious activities. When a host-based intrusion detection system (HIDS) is deployed, it is capable of detecting intrusions solely on the host where it is installed. Installation of HIDS eliminates the need for extra software to identify threats on the system. Intruder detection systems are designed to detect and identify instances of unauthorized access or attacks from within a protected area. The installation cost is substantial due to the requirement of individual Host-based Intrusion Detection Systems (HIDS) for each device as given in Figure 1.4: Host-based IDS.

The Network-Based Intrusion Detection System (NIDS) safeguards network nodes by capturing and scrutinizing all network packets for malicious activities. Figure 1.5 displays the structure of the NIDS. The sensor is strategically positioned in a vulnerable region inside the network, bridging the server and the network. The NIDS monitors both incoming and outgoing communications. If the system identifies any network risks, it will need to respond rigorously in order to safeguard itself. One possible course of action is to prohibit network access from the specified IP address, while another alternative is to inform the responsible party through warning notifications. Determining if the NIDS has noticed their potential intrusions might provide a challenge for a thief. Monitoring extensive networks is under the purview of only a limited number of intrusion detection systems. To mitigate potential security risks, it is imperative to implement scanners, sniffers, and network intrusion detection tools. These measures are necessary to safeguard against various malicious activities such as IP spoofing, DOS assaults, DNS name corruption, man-in-the-middle attacks, and arp cache poisoning. These vulnerabilities arise due to the inherent weaknesses in TCP/IP protocols represented in Figure 1.5 Network-Based Intrusion Detection System.

Figure 1.4 Host-based IDS.

Figure 1.5 Network-based intrusion detection system.

Hybrid Intrusion Detection Systems (HIDS) integrate the functionalities of several intrusion detection systems to identify and expose intrusions. A hybrid intrusion detection system integrates data from both the network and the host agent or system to create a full overview of the network system. The hybrid technique is the most effective strategy for intrusion detection. Prelude is an example of a hybrid intrusion detection system.

1.5 Machine Learning in Intrusion Detection System

Soft computing makes it possible to build intelligent machines that are able to solve challenging issues that arise in the real world but are beyond the purview of standard mathematical modelling. These kinds of problems cannot be adequately modelled using traditional methods. It has a high tolerance for approximate information, ambiguity, imprecision, and merely a partial view of the environment [18], which enables it to emulate the way individuals form their opinions and make decisions. In this section, we will have a brief discussion on the many different techniques to soft computing that may be used in the process of detecting intrusions.

The genetic algorithm (GA) is a search engine that has been in use since it was conceived in Holland. This search engine is both strong and adaptable. There it first emerged in its current shape for the first time. Because of advances in technology, it is now possible to recreate the natural process of evolution that takes place in uncontrolled environments. The GA may be seen in this way as an example of a global search process that depends on randomness. The concept of “survival of the fittest” is applied by the algorithm to the challenge of developing ever more accurate approximations of a solution to the issue.

The most experienced people in the sector are recruited to teach the next generation, which ultimately results in the development of novel solutions to the issue. If this approach is used, the newly recruited staff members could be better able to address the current challenge [19]. The fitness function enables us to get insight into how well people fared on the aspects of the exam that were the most challenging [20].

PSO was first developed in 1995 by [21], who drew their inspiration from the way fish and birds congregate in groups known respectively as flocks and schools. In an effort to discover a solution, a “population” of particles is moved over the damaged region at specified speeds and rotated clockwise and anticlockwise. By employing the stochastic calibration approach and taking into consideration the best preceding and best adjacent locations of the particles, the velocities of the particles may be changed appropriately. A random number generator is what’s needed to get this done.

A kind of logic known as fuzzy logic is one that employs the practice of approximation. The paradigms for optimization and classification used in machine learning are both underpinned by evolutionary computing, which is based on genetic and natural selection-based evolutionary processes. The origin of these evolutionary processes may be traced back to evolution. The majority of the time, genetic algorithms are used [22] in applications that are based on the actual world of business.

In contrast to the conventional naive Bayesian classifier, the HNB may take on a variety of forms depending on the circumstances. Finding the attribute’s hidden parent needs the inclusion of a further layer in the HNB model, which necessitates the addition of this layer. The structures of the HNB components may be inferred with the help of Naive Bayes. Each characteristic has a hidden past that was fostered to bring together the many energy that it symbolizes. For the purpose of providing an overview of the covert parents, we may make use of the mean of weighted one-dependency estimators [23, 24].

The support vector machine, sometimes known as an SVM [25, 26] for short, is a technique to classification that is grounded on statistical learning theory (SLT) [27–29]. Another kind of system that is comparable is known as a hyper-plane classifier. In support vector machines (SVM), a good hyper-plane is one that successfully separates the classes while keeping the amount of interclass overlap to a bare minimum.

Deep neural networks, more often referred to as DBNs, are generative graph models that are used in machine learning. These networks are built on latent variables, which are also referred to as hidden units. These networks simply link the levels themselves, and not the units that are included inside those levels.

We may look at the model that was built by researchers and published in [24] as an illustration of one method that can be used to determine attributes for an intrusion detection system.

1.6 Conclusion

The issue of safety is of utmost importance in the context of IoT and other types of pervasive connectivity. There is a growing probability that attacks would focus on companies and organizations that utilize IoT. Traditional cybersecurity systems face multiple obstacles when attempting to detect zero-day threats. The invader exploits the privileges offered by the IoT architecture to acquire valuable data. There are few security risks that are widely recognized, and even fewer that involve slow and unnoticed attacks. An effective strategy to tackle these unexpected challenges is to construct intrusion detection systems using machine learning techniques. Cyberattacks on the Internet of Things architecture may result in data loss or information loss, as well as IoT device sluggishness. To guarantee the security of data and networks, intrusion detection systems have been in use for the last 20 years. Because the Internet of Things (IoT) uses unique standards and protocol stacks, traditional intrusion detection methods are not successful in identifying security breaches in its network. Because the amount of data generated by IoT is infinite, it is difficult to regularly analyze it. A system or network is protected from unauthorized access by an intrusion detection system (IDS), which actively monitors and detects any potentially harmful or suspicious activity. Machine learning technologies offer reliable and effective methods for reducing these specific risks.

References

1. Raghuvanshi, A., Singh, U.K.

et al.

, Intrusion Detection Using Machine Learning for Risk Mitigation in IoT-Enabled Smart Irrigation in Smart Farming.

J. Food Qual.

, 2022, 1, 1–8, 2022.

2. Abhishek, R., Singh, U.K., Phasinam, K., Kassanuk, T., Internet of Things-Security Vulnerabilities and Countermeasures.

ECS Trans.

, 107, 1, 15043–15053, 2022.

3. Raghuvanshi, A., Singh, U.K., Joshi, C., A Review of Various Security and Privacy Innovations for IoT Applications in Healthcare.

Adv. Healthcare Syst.

, 1, 43–58, 2022, doi: 10.1002/9781119769293.ch4.

4. Zhang, Q. and Wang, X., SQL injections through back-end of RFID system, in:

2009 International Symposium on Computer Network and Multimedia Technology. CNMT 2009

, pp. 1–4, IEEE, 2009.

5. Li, Z.

et al.

, VulPecker: an automated vulnerability detection system based on code similarity analysis.

ACM, Proc. of the 32 Annual Conference on Computer Security Applications

, p. 201213, 2016.

6. Guojun, Z.

et al.

, Design and application of intelligent dynamic crawler for web data mining, in:

Automation (YAC), 2017 32nd Youth Academic Annual Conference of Chinese Association

, pp. 1098–1105, IEEE, 2017.

7. Medeiros, I., Neves, N., Correia, M., Detecting and removing web application vulnerabilities with static analysis and data mining.

IEEE Trans. Reliab.

, 1, 54–69, IEEE, 2016.

8. Masood, A. and Java, J., Static Analysis for Web Service Security – Tools & Techniques for a Secure Development Life Cycle.

International Symposium on Technologies for Homeland Security

, pp. 1–6, 2015.

9. Medeiros, I. and Neves, N., Detecting and Removing Web Application Vulnerabilities with Static Analysis and Data Mining.

IEEE Trans. Reliab.

, 1, 1–16, 2015.

10. Salas, M.I., de Geus, P.L., Martins, E., Security Testing Methodology for Evaluation of Web Services Robustness - Case: XMLInjection.

IEEE World Congress on Services

, pp. 303–310, 2015.

11. Madan, S., Security Standards Perspective to Fortify Web Database Applications from Code Injection Attacks.

International Conference on Intelligent Systems, Modelling and Simulation

, pp. 226–233, 2010.

12. Teodoro, N. and Serrao, C., Web application security: Improving critical web - based applications quality through in - depth security analysis, in:

International Conference on Information Society (i- Society)

, pp. 457–462, 2011.

13. Wang, X. and Reiter, M.K., Using Web-Referral Architectures to Mitigate Denial-of-Service Threats.

J. IEEE Trans. Dependable Secure Comput.

, 7, 2, 203–216, 2010.

14. Liu, A., Yuan, Y., Wijesekera, D., Stavrou, A., SQLProb: a proxy-based architecture towards preventing SQL injection attacks, in:

Proceedings ACM Symposium on Applied Computing (SAC’09)

, pp. 2054–2061, 2009.

15. Gruschka, N., Jensen, M., Lo Iacono, L., Luttenberger, Server-side Streaming Processing of WS-Security.

IEEE Trans. Serv. Comput.

, 4, 4, 272–285, 2011.

16. Ladan, M.I., Web Services Metrics: A Survey and A Classification.

J. Commun. Comput.

, 9, 7, 824–829, 2012.

17. Hoque, N., Bhuyan, M.H., Baishya, R.C., Bhattacharyya, D.K., Kalita, Network Attacks: Taxonomy, tools and systems.

J. Comput. Netw. Appl.

, 1, 13–26, 4 October 2013, doi:

doi.org/10.1016/j.jnca.2013.08.001

.

18. Kulshestha, G., Agarwal, A., Mittal, A., Sahoo, A., Hybrid Cuckoo Search Algorithm for Simultaneous Feature and Classifier Selection.

IEEE International Conference on Cognitive Computing and Information Processing (CCIP)

, pp. 1–6, 2015.

19. Visumathi, J. and Shunmuganathan, K.L., A computational intelligence for evaluation of intrusion detection system.

Indian J. Sci. Technol.

, 4, 1, 28–34, Jan 2011.

20. Wang, B., Yao, X., Jiang, Y., Sun, C., Shabaz, M., Design of a Real-Time Monitoring System for Smoke and Dust in Thermal Power Plants Based on Improved Genetic Algorithm.

J. Healthc. Eng

, 2021, D. Singh (Ed.), pp. 1–10, Hindawi Limited, UAE, 2021,

https://doi.org/10.1155/2021/7212567

.

21. Mohanasundaram, S., Ramirez-Asis, E., Quispe-Talla, A., Bhatt, M.W., Shabaz, M., Experimental replacement of hops by mango in beer: production and comparison of total phenolics, flavonoids, minerals, carbohydrates, proteins and toxic substances,

Int. J. Syst. Assur. Eng. Manage.

, Springer Science and Business Media LLC, UAE, 2021,

https://doi.org/10.1007/s13198-021-01308-3

.

22. Almahirah, M.S., S, V.N., Jahan, M., Sharma, S., Kumar, S., Role of Market Microstructure in Maintaining Economic Development.

Empirical Econ. Lett.

, 20, 2, 01–14, 2021.

23. Chaudhary, A., Tiwari, V.N., Kumar, A., Analysis of Fuzzy Logic Based Intrusion Detection Systems in Mobile Ad Hoc Networks.

Int. J. Inf. Technol.

, 6, 1, 183–198, June 2014.

24. Rathore, N. and Rajavat, A., Smart Farming Based on IOT-Edge Computing: Applying Machine Learning Models For Disease And Irrigation Water Requirement Prediction In Potato Crop Using Containerized Microservices, in:

Precision Agriculture for Sustainability

, pp. 399–424, Apple Academic Press, UAE, 2024.

25. Patsariya, M. and Rajavat, A., A Progressive Design of MANET Security Protocol for Reliable and Secure Communication.

Int. J. Intell. Syst. Appl. Eng.

,

12

, 9s, 190–204, 2024.

26. Rathi, M. and Rajavat, A., Investigations and Design of Privacy-Preserving Data Mining Technique for Secure Data Publishing.

Int. J. Intell. Syst. Appl. Eng.

,

11

, 9s, 351–367, 2023.

27. Dubey, P. and Rajavat, A., Effective K-means clustering algorithm for efficient data mining, in:

2023 2nd International Conference on Vision Towards Emerging Trends in Communication and Networking Technologies (ViTECoN)

, pp. 1–6, IEEE, 2023, May.

28. Nahar, S., Pithawa, D., Bhardwaj, V., Rawat, R., Rawat, A., Pachlasiya, K., Quantum Technology for Military Applications.

Quantum Comput. Cybersecur.

, 1, 313–334, 2023.

29. Pithawa, D., Nahar, S., Bhardwaj, V., Rawat, R., Dronawat, R., Rawat, A., Quantum Computing Technological Design Along with Its Dark Side.

Quantum Comput. Cybersecur.

, 1, 295–312, 2023.

Note

*

Corresponding author

:

[email protected]

2Frequent Pattern Mining Using Artificial Intelligence and Machine Learning

R. Deepika1*, Sreenivasulu Gogula2, K. Kanagalakshmi3, Anshu Mehta4, S. J. Vivekanandan5 and D. Vetrithangam6

1Department of AI&DS, B V Raju Institute of Technology, Narsapur, Telangana, India

2Department of CSE (Data Science), Vardhaman College of Engineering, Shamshabad, Hyderabad, India

3Department of Computer Applications, SRM Institute of Science and Technology (Deemed to be University), Trichy, India

4Department of Computer Science and Engineering, Chandigarh University, Mohali, Punjab, India

5Department of Computer Science and Engineering, Dhanalakshmi College of Engineering, Dr V P R Nagar, Manimangalam, Tambaram, Chennai, India

6Department of Computer Science & Engineering, University Institute of Engineering, Chandigarh University, Mohali, Punjab, India

Abstract

Frequent pattern mining is a very active topic in the field of data mining. Numerous researchers have considered it since its beginning. The dimensions of all areas expand exponentially with the advancement and accumulation of data. The ability to effectively and easily assess and extract time-sensitive information from large datasets is essential for making informed decisions and uncovering new knowledge. Data mining is the use of sophisticated analytics on large databases to discover previously unidentified links, patterns, and trends. Efficient and adaptable handling of large-scale data is crucial for retrieving information and making informed decisions. Data mining is the systematic analysis of vast quantities of data to uncover previously undiscovered correlations, patterns, and trends. Since the inception of the World Wide Web, there has been a rapid and significant increase in the quantity of data that is stored and can be accessed electronically. Data mining, which refers to the process of extracting new insights from data, has become a crucial tool for both business and academic sectors. With the introduction of the Internet, there has been a rapid increase in the quantity of data stored and available online. Consequently, the methods for extracting valuable information from this extensive collection of data have become crucially significant in several domains, such as business and academics. Frequent Item Set Mining is a very popular technique for getting significant insights from datasets.

Keywords: Frequent pattern mining, decision tree, KNN, accuracy, machine learning

2.1 Introduction

Data mining is the systematic exploration of extensive databases to discover noteworthy and previously unidentified patterns [1]. The step described by Fayyad et al. for Knowledge Discovery in Databases (KDD) is included in this approach. Data cleaning, integration, selection, transformation, mining, pattern assessment, and knowledge representation are all integral components of the continuing process known as Knowledge Discovery in Databases (KDD). Data mining may be used to several types of data. The approaches and procedures may vary when used to different sorts of data. The patterns extracted from data might vary in terms of their nature and the specific sort of data mining task. Data mining jobs may be broadly classified into two categories: descriptive and predictive. Predictive data mining utilizes existing data to generate predictions, whereas descriptive data mining aims to explain the overall characteristics of the provided data.

Bayes’ hypothesis and relapse inquiry were employed in the 1700s to distinguish designs from noise (1800s). Increases in PC innovation have led to a broader variety and higher capacity for information. Hands-on information examination has expanded as the quantity and complexity of informative indexes have grown. There have been a variety of software engineering breakthroughs that have led to this progress, such as the discoveries of neural systems, bunching, hereditary computations (1950s), decision trees (1960s), and support vector machines (1980s) [2].

In a decision tree, each node represents an evaluation of some attribute’s value, and each branch represents the evaluation’s outcome. The tree’s leaves represent classes or distributions of classes. It’s a cinch to convert from decision trees to characterization rules. Decision trees can cope with a lot of information. When it comes to storing data, they use a tree structure that is intuitive and simple to learn. Using a decision tree is a straightforward process that requires just a few easy steps to understand and put together. Decision tree enlistment computations have been used in a wide range of fields, including medical, manufacturing, budgeting, cosmology, and subatomic research [3].

Both AI and data mining heavily rely on tree-based learning approaches. It’s no secret that these strategies have been in use for a long time. There is nothing over the top about them, and that’s exactly what makes them so endearing. When making decision trees, a top-down strategy is often used to identify a univariate split that boosts some local basis (for example, gain percentage) until the leaf segments of the tree are sufficiently pure. Pessimistic Error Pruning uses heuristics that may be measured, while Reduced Error Pruning uses a single set of pruning to determine this utility.

It is a very costly strategy to employ the Naive Bayes classifiers as leaf hubs in all of the first-level child hubs (evaluated by cross-approval), yet this is the only way NB Tree can deliver them in a decision tree. At each node, students analyze additional characteristics as straight, quadratic, or calculated attribute elements, and these elements are then sent down the tree in the same manner that they were processed before. However, despite the fact that root-to-leaf probability dispersions are referred to as disseminations, leaf hubs remain the primary classifiers [4].

This study introduces a recursive Bayesian classifier. One hundred percent accuracy in decision tree enlistment has previously been achieved by a variety of methods, and many of them have been successful. As a result, these new approaches were time-consuming and difficult to learn, and this was the major issue. Recursively dividing the data into places where there is a suspicion of constraining freedom is the most significant aspect. Planning from perceptions of the item to choices based on those perceptions is how judgements about the objective value of anything are made [5].

Determining whether a system is well on its way to attaining its objective is the most prominent usage of decision trees in tasks research. Restrictive probabilities may be calculated using decision trees. A decision tree (also known as a tree outline) is a decision aid that employs a tree-like diagram or model to describe alternatives and their probable outcomes, such as chance event effects, asset expenditures and utility. The decision tree induction approach has been used effectively in master frameworks to gather information. It is possible to use decision trees to enrol people from a variety of data sources [6].

2.2 Data Mining Functions

Information mining is an assortment of procedures for proficient computerized disclosure of beforehand obscure, substantial, novel, helpful, and reasonable examples in enormous databases. The examples must be significant with the goal that they might be utilized in an endeavor’s dynamic procedure [7]. Information mining procedures can be gathered as follows as given in Figure 2.1: Data Mining Methods:

Classification-It is necessary to classify the supplied information event into one of the objective classes that have already been identified or defined. One of the models may be whether a customer is a trustworthy client or a defaulter in Visa’s interchange information base, based on his distinct segment and previous purchase characteristics [

8

].

Estimation-Like order, the motivation behind an estimation model is to decide an incentive for an obscure yield trait. In any case, in contrast to grouping, the yield quality for an estimation issue is numeric as opposed to clear cut.

Prediction-It isn’t anything but difficult to separate forecast from grouping or estimation.

Figure 2.1 Data mining methods.

The primary distinction lies in the fact that the predictive model extrapolates results into the foreseeable future rather than providing directives for actions in the here and now. The discontinuous or quantitative nature of the output characteristic can be chosen. One illustration of what a model might entail is making a forecast regarding the value of the Dow Jones Industrial Average at the end of the following week, and explains the history of a decision tree as well as its possible applications in more detail.

Association rule mining-Here interesting hidden rules called affiliation rules in a huge value-based information base is mined out. For example, the standard {milk, margarine >biscuit} gives the data that at whatever point milk and spread are bought together scone is additionally bought, with the end goal that these things can be set together for deals to build the general deals of every one of the things [

9

].