190,99 €
DATA MINING AND MACHINE LEARNING APPLICATIONS
The book elaborates in detail on the current needs of data mining and machine learning and promotes mutual understanding among research in different disciplines, thus facilitating research development and collaboration.
Data, the latest currency of today’s world, is the new gold. In this new form of gold, the most beautiful jewels are data analytics and machine learning. Data mining and machine learning are considered interdisciplinary fields. Data mining is a subset of data analytics and machine learning involves the use of algorithms that automatically improve through experience based on data.
Massive datasets can be classified and clustered to obtain accurate results. The most common technologies used include classification and clustering methods. Accuracy and error rates are calculated for regression and classification and clustering to find actual results through algorithms like support vector machines and neural networks with forward and backward propagation. Applications include fraud detection, image processing, medical diagnosis, weather prediction, e-commerce and so forth.
The book features:
Audience
Industry and academic researchers, scientists, and engineers in information technology, data science and machine and deep learning, as well as artificial intelligence more broadly.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 653
Veröffentlichungsjahr: 2022
Cover
Title Page
Copyright
Preface
1 Introduction to Data Mining
1.1 Introduction
1.2 Knowledge Discovery in Database (KDD)
1.3 Issues in Data Mining
1.4 Data Mining Algorithms
1.5 Data Warehouse
1.6 Data Mining Techniques
1.7 Data Mining Tools
References
2 Classification and Mining Behavior of Data
2.1 Introduction
2.2 Main Characteristics of Mining Behavioral Data
2.3 Research Method
2.4 Results
2.5 Discussion
2.6 Conclusion
References
3 A Comparative Overview of Hybrid Recommender Systems: Review, Challenges, and Prospects
3.1 Introduction
3.2 Related Work on Different Recommender System
References
4 Stream Mining: Introduction, Tools & Techniques and Applications
4.1 Introduction
4.2 Data Reduction: Sampling and Sketching
4.3 Concept Drift
4.4 Stream Mining Operations
4.5 Tools & Techniques
4.6 Applications
4.7 Conclusion
References
5 Data Mining Tools and Techniques: Clustering Analysis
5.1 Introduction
5.2 Data Mining Task
5.3 Data Mining Algorithms and Methodologies
5.4 Clustering the Nearest Neighbor
5.5 Data Mining Applications
5.6 Materials and Strategies for Document Clustering
5.7 Discussion and Results
References
6 Data Mining Implementation Process
6.1 Introduction
6.2 Data Mining Historical Trends
6.3 Processes of Data Analysis
References
7 Predictive Analytics in IT Service Management (ITSM)
7.1 Introduction
7.2 Analytics: An Overview
7.3 Significance of Predictive Analytics in ITSM
7.4 Ticket Analytics: A Case Study
7.5 Conclusion
References
8 Modified Cross-Sell Model for Telecom Service Providers Using Data Mining Techniques
8.1 Introduction
8.2 Literature Review
8.3 Methodology and Implementation
8.4 Data Partitioning
8.5 Conclusions
References
9 Inductive Learning Including Decision Tree and Rule Induction Learning
9.1 Introduction
9.2 The Inductive Learning Algorithm (ILA)
9.3 Proposed Algorithms
9.4 Divide & Conquer Algorithm
9.5 Decision Tree Algorithms
9.6 Conclusion and Future Work
References
10 Data Mining for Cyber-Physical Systems
10.1 Introduction
10.2 Feature Recovering Methodologies
10.3 CPS vs. IT Systems
10.4 Collections, Sources, and Generations of Big Data for CPS
10.5 Spatial Prediction
10.6 Clustering of Big Data
10.7 NoSQL
10.8 Cyber Security and Privacy Big Data
10.9 Smart Grids
10.10 Military Applications
10.11 City Management
10.12 Clinical Applications
10.13 Calamity Events
10.14 Data Streams Clustering by Sensors
10.15 The Flocking Model
10.16 Calculation Depiction
10.17 Initialization
10.18 Representative Maintenance and Clustering
10.19 Results
10.20 Conclusion
References
11 Developing Decision Making and Risk Mitigation: Using CRISP-Data Mining
11.1 Introduction
11.2 Background
11.3 Methodology of CRISP-DM
11.4 Stage One—Determine Business Objectives
11.5 Stage Two—Data Sympathetic
11.6 Stage Three—Data Preparation
11.7 Stage Four—Modeling
11.8 Stage Five—Evaluation
11.9 Stage Six—Deployment
11.10 Data on ERP Systems
11.11 Usage of CRISP-DM Methodology
11.12 Modeling
11.13 Assessment
11.14 Distribution
11.15 Results and Discussion
11.16 Conclusion
References
12 Human–Machine Interaction and Visual Data Mining
12.1 Introduction
12.2 Related Researches
12.3 Visual Genes
12.4 Visual Hypotheses
12.5 Visual Strength and Conditioning
12.6 Visual Optimization
12.7 The Vis 09 Model
12.8 Graphic Monitoring and Contact With Human–Computer
12.9 Mining HCI Information Using Inductive Deduction Viewpoint
12.10 Visual Data Mining Methodology
12.11 Machine Learning Algorithms for Hand Gesture Recognition
12.12 Learning
12.13 Detection
12.14 Recognition
12.15 Proposed Methodology for Hand Gesture Recognition
12.16 Result
12.17 Conclusion
References
13 MSDTrA: A Boosting Based-Transfer Learning Approach for Class Imbalanced Skin Lesion Dataset for Melanoma Detection
13.1 Introduction
13.2 Literature Survey
13.3 Methods and Material
13.4 Experimental Results
13.5 Libraries Used
13.6 Comparing Algorithms Based on Decision Boundaries
13.7 Evaluating Results
13.8 Conclusion
References
14 New Algorithms and Technologies for Data Mining
14.1 Introduction
14.2 Machine Learning Algorithms
14.3 Supervised Learning
14.4 Unsupervised Learning
14.5 Semi-Supervised Learning
14.6 Regression Algorithms
14.7 Case-Based Algorithms
14.8 Regularization Algorithms
14.9 Decision Tree Algorithms
14.10 Bayesian Algorithms
14.11 Clustering Algorithms
14.12 Association Rule Learning Algorithms
14.13 Artificial Neural Network Algorithms
14.14 Deep Learning Algorithms
14.15 Dimensionality Reduction Algorithms
14.16 Ensemble Algorithms
14.17 Other Machine Learning Algorithms
14.18 Data Mining Assignments
14.19 Data Mining Models
14.20 Non-Parametric & Parametric Models
14.21 Flexible vs. Restrictive Methods
14.22 Unsupervised vs. Supervised Learning
14.23 Data Mining Methods
14.24 Proposed Algorithm
14.25 The Regret of Learning Phase
14.26 Conclusion
References
15 Classification of EEG Signals for Detection of Epileptic Seizure Using Restricted Boltzmann Machine Classifier
15.1 Introduction
15.2 Related Work
15.3 Material and Methods
15.4 Experimental Framework
15.5 Experimental Results and Discussion
15.6 Discussion
15.7 Conclusion
References
16 An Enhanced Security of Women and Children Using Machine Learning and Data Mining Techniques
16.1 Introduction
16.2 Related Work
16.3 Issue and Solution
16.4 Selection of Data
16.5 Pre-Preparation Data
16.6 Application Development
16.7 Use Case For The Application
16.8 Conclusion
References
17 Conclusion and Future Direction in Data Mining and Machine Learning
17.1 Introduction
17.2 Machine Learning
17.3 Conclusion
References
Index
End User License Agreement
Chapter 1
Figure 1.1 Knowledge discovery in Database—KDD.
Figure 1.2 Time series database.
Figure 1.3 Data warehouse.
Figure 1.4 Decision Tree.
Figure 1.5 Installation of KNIME.
Figure 1.6 Installation of KNIME (2).
Figure 1.7 Setting path for installing KNIME.
Figure 1.8 Starting installation of KNIME.
Figure 1.9 Selecting directory as a workspace.
Figure 1.10 Starting KNIME.
Figure 1.11 Completing setup wizard.
Figure 1.12 Installing Workspace in KNIME.
Figure 1.13 Installing KNIME (2).
Figure 1.14 Specifying memory for KNIME.
Figure 1.15 Finalizing the installation of KNIME.
Figure 1.16 Initial screen of KNIME.
Chapter 2
Figure 2.1 Process of mining data stream.
Figure 2.2 Sample of graph data set.
Figure 2.3 Multi-source & multidimensional information.
Figure 2.4 Shows the data mining process of multimedia data.
Figure 2.5 Mining of multimedia data.
Figure 2.6 Shows social media mining.
Figure 2.7 Shows total facebook users per year.
Figure 2.8 Shows the spatiotemporal data mining process.
Figure 2.9 Shows the schematic outline of the information mining-based strategy.
Figure 2.10 Calculated regression output.
Figure 2.11 Highlight significance yield.
Figure 2.12 Cause patterns of ventilation system operations.
Chapter 3
Figure 3.1 Evolution process of RS.
Figure 3.2 Recommender systems.
Figure 3.3 Demographic recommender systems.
Figure 3.4 Utility recommender system.
Figure 3.5 Knowledge-Based RS.
Figure 3.6 Hybrid recommender system.
Figure 3.7 Architecture of this paper.
Chapter 4
Figure 4.1 Overview of data stream processing system.
Figure 4.2 Configuration settings for stream generators in MOA.
Figure 4.3 Learning models for various classifiers in MOA.
Figure 4.4 Performance Parameters for Evaluating Classifiers in MOA.
Figure 4.5 Clustering window of MOA for data stream clustering.
Figure 4.6 Clustering algorithms available in MOA.
Figure 4.7 Performance parameters for evaluating clustering in MOA.
Figure 4.8 Clustering visualization in MOA.
Figure 4.9 Concept drift detection techniques in MOA.
Figure 4.10 Active learning algorithms in MOA.
Figure 4.11 Outlier detection algorithms available in MOA.
Figure 4.12 Outlier visualization in MOA.
Figure 4.13 Code snippet for demonstrating stream package from R for stream clus...
Figure 4.14 Output of stream package from R for stream clustering.
Figure 4.15 Code snippet for clustering animation from stream package in R.
Figure 4.16 Clustering animation from stream package in R.
Chapter 5
Figure 5.1 Shows different data clustering stages.
Figure 5.2 Shows clustering techniques classifications.
Figure 5.3 Shows centroid linkage clustering.
Figure 5.4 Show fuzzy clustering.
Figure 5.5 Shows silhouette’s graphical representation of clusters. (a) Represen...
Figure 5.6 Shows Output vectors of Algorithm.
Figure 5.7 Shows realistic of the three Silhouettes with a various number of gro...
Figure 5.8 Shows Silhouette estimation using K-Means Algorithm archives by the f...
Chapter 6
Figure 6.1 Data mining implementation process.
Figure 6.2 Shows flowchart of the research.
Figure 6.3 Exactness classifier’s comparison.
Figure 6.4 Datastream model.
Figure 6.5 Model execution graph.
Chapter 7
Figure 7.1 Process flow of incident and ticket.
Figure 7.2 Process flow of incident and ticket.
Figure 7.3 Service request effort estimation and incident resolution workflow.
Figure 7.4 Ticket count vs. day of week based on priority of the tickets.
Figure 7.5 Ticket count vs time of the day in which it is logged based on priori...
Figure 7.6 Industry domain by ticket volume.
Figure 7.7 Sample ticket format.
Figure 7.8 Raw data collected from organization.
Figure 7.9 Research methodology to develop effort prediction model.
Figure 7.10 Effort values predicted vs. observed using training and test dataset...
Figure 7.11 Effort values predicted vs. observed using training and test dataset...
Chapter 8
Figure 8.1 Process life cycle.
Figure 8.2 Confusion matrices.
Chapter 9
Figure 9.1 Decision tree example.
Figure 9.2 Shows the ID3 algorithm.
Figure 9.3 Shows RULES flow chart.
Figure 9.4 RULES-3 calculation.
Figure 9.5 Procedure of RULE-3 plus rule forming.
Figure 9.6 RULE-4 incremental induction procedure.
Figure 9.7 Pseudocode portrayal of RULES-6.
Figure 9.8 RULES3-EXT calculation.
Figure 9.9 A disentangled depiction of RULES-7.
Figure 9.10 REX-1 algorithm.
Figure 9.11 Construction procedure of fuzzy decision tree.
Figure 9.12 Offered multidimensional databases architecture from fuzzy data mini...
Chapter 10
Figure 10.1 Shows different sources of big data.
Figure 10.2 Show automated CPS cycle.
Figure 10.3 Show clustering of big data.
Figure 10.4 Shows CPS smart grid.
Figure 10.5 Shows CPS military application.
Figure 10.6 Shows CPS smart city application.
Figure 10.7 Shows CPS environmental application.
Figure 10.8 FlockStream Algorithm’s pseudo-code.
Figure 10.9 (a) Synthetic information sets. (b) Clustering was performed by Floc...
Chapter 11
Figure 11.1 Shows CRISP-DM methodology.
Figure 11.2 Shows all six stages of the CRISP-DM model.
Figure 11.3 Shows business understanding phase of CRISP-DM model.
Figure 11.4 Shows data understanding phase of CRISP-DM model.
Figure 11.5 Shows data preparation phase of CRISP-DM model.
Figure 11.6 Shows modeling phase of CRISP-DM model.
Figure 11.7 Shows evaluation phase of the CRISP-DM model.
Figure 11.8 Shows deployment phase of CRISP-DM model.
Figure 11.9 Shows coordination of various modules in ERP frameworks.
Figure 11.10 Cash flow data mining CRISP-DM methodology.
Figure 11.11 InfoCube loading in SAP BIW’s ETL mapping.
Figure 11.12 DM modeling & visualization of SAP ADP.
Figure 11.13 Shows overall influence chart of relative dominance in clustering m...
Figure 11.14 Overall influence chart of clustering model.
Chapter 12
Figure 12.1 Shows the CRISP and KDD process flowchart.
Figure 12.2 Shows data mining process.
Figure 12.3 Shows visual analytics flowchart.
Figure 12.4 Shows visual analytics and human–interaction.
Figure 12.5 HCI information mining approach with an accentuation on parts of ind...
Figure 12.6 Reflections of basic inductive learning ideas thought about and rela...
Figure 12.7 Shows human involvement in different data mining approaches.
Figure 12.8 Flowchart of gesture recognition.
Figure 12.9 Block diagram of Gesture Recognition framework.
Figure 12.10 (a, b, c, d): Shows hand gestures.
Figure 12.11 Zoom-in gesture recognized.
Figure 12.12 Zoom-out gesture recognized.
Figure 12.13 Towards right movement gesture recognized.
Chapter 13
Figure 13.1 Different types of datasets: (a) standard (balanced), (b) unbalanced...
Figure 13.2 Sample skin lesion images from public dataset (a) PH2, (b) ISIC2016,...
Figure 13.3 Illustration of proposed framework.
Figure 13.4 Representing decision boundary amid negative and positive instances ...
Figure 13.5 Performance comparison analysis. (a) AUCROC respective to standard d...
Chapter 14
Figure 14.1 Shows supervised learning algorithm.
Figure 14.2 Shows unsupervised learning algorithm.
Figure 14.3 Shows semi-supervised learning algorithm.
Figure 14.4 Shows regression learning algorithm.
Figure 14.5 Shows instance-bases learning algorithm.
Figure 14.6 Shows regularization algorithm.
Figure 14.7 Shows decision-tree algorithm.
Figure 14.8 Shows bayesian algorithm.
Figure 14.9 Shows clustering algorithm.
Figure 14.10 Shows association rule learning algorithm.
Figure 14.11 Shows artificial neural network algorithm.
Figure 14.12 Shows deep learning algorithm.
Figure 14.13 Shows dimensionality reduction algorithm.
Figure 14.14 Shows ensemble algorithm.
Figure 14.15 Information mining assignments and models.
Figure 14.16 Descriptions of under-fit, standard, and over-fit versions.
Figure 14.17a Shows supervised learning.
Figure 14.17b Shows sigmoid function.
Figure 14.18 Shows one-versus all or multi-class classification.
Figure 14.19 Shows illustration of clustering process.
Figure 14.20 The description of the learning phase.
Figure 14.21 Harvard database result.
Figure 14.22 Sub-part result of
Iris setosa
.
Figure 14.23 Sub-part result of iris versicolour.
Figure 14.24 Sub-part result of
Iris virginica
.
Chapter 15
Figure 15.1 Represents the flow chart of proposed methodology using three layers...
Figure 15.2 Flow diagram of PCA algorithm used in our proposed methodology.
Figure 15.3 Simple architecture of 3-Layer RBM.
Figure 15.4 Performance Metric Comparison (Accuracy) by using 80–20 as size rati...
Figure 15.5 Performance Metric Comparison (Sensitivity) by using 80–20 as size r...
Figure 15.6 Performance Metric Comparison (Specificity) by using 80–20 as size r...
Figure 15.7 The seizure occurs during each hour for all the pediatric patients o...
Chapter 16
Figure 16.1 Option of boundary space measurements for the Delhi database. (a) wi...
Figure 16.2 Selection of boundary for worldwide calculation of the Delhi dataset...
Figure 16.3 Shows heatmap.
Figure 16.4 Example of transformation from a heatmap into a double guide utilizi...
Figure 16.5 Binary anticipated guides utilizing various percentiles to character...
Figure 16.6 Label forecast by characterized percentile edge.
Figure 16.7 Model evaluation against different category levels. It is feasible t...
Figure 16.8 Heatmaps assumptions using several edges. (a) uses the 96th percenti...
Figure 16.9 Large-goal heatmap from over Delhi region. The rose-colored region o...
Figure 16.10 Shows flow chart of proposed application system.
Figure 16.11 Shows icon of application.
Figure 16.12 Shows form of registration.
Figure 16.13 Shows login page.
Figure 16.14 Misconduct place finder.
Figure 16.15 Location recognized on map.
Figure 16.16 Show message sent by user.
Figure 16.17 Shows received message to enrolled contact.
Figure 16.18 Location of client.
Chapter 17
Figure 17.1 Object instance segmentation.
Chapter 1
Table 1.1 Comparison in a data warehouse—OLTP.
Chapter 2
Table 2.1 Shows social media network applies to various data services.
Chapter 3
Table 3.1 Show the advantage and disadvantage of different types of RS.
Table 3.2 Shows which technique, evaluation criteria are used in different RS.
Table 3.3 Base of usage predication.
Table 3.4 Comparative study of hybrid approach with traditional approach.
Chapter 5
Table 5.1 The transformation from word to lexeme.
Table 5.2 Transformation to an extraordinary word list.
Table 5.3 Vector portrayal of a report corpus.
Table 5.4 The portrayal of the corpus utilized in this examination.
Table 5.5 Groups with various Silhouette an incentive for every calculation.
Chapter 6
Table 6.1 Data mining developments qualified statement.
Table 6.2 Shows application and usage of data mining.
Table 6.3 Understudy related factors.
Table 6.4 High potential variables.
Table 6.5 Analysis of various classifiers.
Table 6.6 Comparative analysis of various classifiers with their precision rates...
Chapter 7
Table 7.1 Number of tickets raised by 24 accounts/day and closed tickets/day: da...
Table 7.2 Overfitness, testing error and accuracy of the random forest model.
Chapter 8
Table 8.1 Buyer counts.
Table 8.2 Analysis of maximum likelihood estimates.
Chapter 11
Table 11.1 Shows cash flow statement source field in SAP System.
Table 11.2 Shows key figures, dimensions, and characteristics of infocube.
Chapter 13
Table 13.1 Summarized detail of source and target datasets.
Chapter 15
Table 15.1 Different waveforms present in the brain.
Table 15.2 CHB-MIT patient wise description.
Table 15.3 Ictal (Seizure), Inter-ictal (Normal), and Pre-ictal (Partial Seizure...
Table 15.4 Performance evaluation measures.
Table 15.5 Performance metric when learning rate is set to 0.001.
Table 15.6 Performance metric when learning rate is set to 0.01.
Table 15.7 Performance metrics when learning rate is set to 0.1.
Table 15.8 Comparative analysis of already proposed methodologies.
Chapter 16
Table 16.1 Labels for every forecast.
Table 16.2 Predictions mean various percentiles limits.
Cover
Table of Contents
Title Page
Copyright
Begin Reading
Index
End User License Agreement
v
ii
iii
iv
xvii
xviii
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
195
196
197
198
199
200
201
202
203
204
205
206
207
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
461
462
463
464
465
466
Scrivener Publishing100 Cummings Center, Suite 541JBeverly, MA 01915-6106
Publishers at Scrivener
Martin Scrivener ([email protected])Phillip Carmical ([email protected])
Edited by
Rohit Raja
Kapil Kumar Nagwanshi
Sandeep Kumar
and
K. Ramya Laxmi
This edition first published 2022 by John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA and Scrivener Publishing LLC, 100 Cummings Center, Suite 541J, Beverly, MA 01915, USA
© 2022 Scrivener Publishing LLC
For more information about Scrivener publications please visit www.scrivenerpublishing.com.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.
Wiley Global Headquarters
111 River Street, Hoboken, NJ 07030, USA
For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.
Limit of Liability/Disclaimer of Warranty
While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials, or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read.
Library of Congress Cataloging-in-Publication Data
ISBN 978-1-119-79178-2
Cover image: Pixabay.Com
Cover design by Russell Richardson
Set in size of 11pt and Minion Pro by Manila Typesetting Company, Makati, Philippines
Printed in the USA
10 9 8 7 6 5 4 3 2 1
Data, the latest currency of today’s world, is the new gold. In this new form of gold, the most beautiful jewels are data analytics and machine learning. Data mining and machine learning are considered interdisciplinary fields. Data mining is a subset of data analytics and machine learning involves the use of algorithms that automatically improve through experience based on data. However, the term data mining is a misnomer because it means to mine but not extract knowledge. A more apt term would be “knowledge discovery from data,” since it is the practice of examining large pre-existing databases to generate information. Data mining algorithms are currently being investigated and applied worldwide.
Massive datasets can be classified and clustered to obtain accurate results. The most common technologies used include classification and clustering methods. Accuracy and error rates are calculated for regression and classification, and clustering to find actual results through algorithms like support vector machines and neural networks with forward and backward propagation. Applications include fraud detection, image processing, medical diagnosis, weather prediction, e-commerce and so forth. Data mining algorithms are even used to analyze data by using sentiment analysis. These applications have been increasing in different areas and fields. Web mining and text mining also paved their way to construct the concrete q2 field in data mining.
This book is intended for industrial and academic researchers, and scientists and engineers in the information technology, data science and machine and deep learning domains. Featured in the book are:
A review of the state-of-the-art in data mining and machine learning,
A review and description of the learning methods in human-computer interaction,
Implementation strategies and future research directions used to meet the design and application requirements of several modern and real-time applications for a long time,
The scope and implementation of a majority of data mining and machine learning strategies, and
A discussion of real-time problems.
This book is a better choice than most other books available on the market because they were published a long time ago, and hence seldom elaborate on the current needs of data mining and machine learning. It is our hope that this book will promote mutual understanding among researchers in different disciplines, and facilitate future research development and collaborations.
We want to express our appreciation to all of the contributing authors who helped us tremendously with their contributions, time, critical thoughts, and suggestions to put together this peer-reviewed edited volume. The editors are also thankful to Scrivener Publishing and its team members for the opportunity to publish this volume. Lastly, we thank our family members for their love, support, encouragement, and patience during the entire period of this work.
Rohit RajaKapil Kumar NagwanshiSandeep KumarK. Ramya LaxmiNovember 2021
Srinivas Konda1*, Kavitarani Balmuri1 and Kishore Kumar Mamidala2
1Department of Computer Science and Engineering, CMR Technical Campus, Kandlakoya, Hyderabad, India
2Department of Computer Science and Engineering, Vivekananda Institute of Technology and Science, Karimnagar, India
Abstract
Behavior information is Information created by, or because of, a client’s commitment to a business. This can incorporate things like site visits, e-mail recruits, or other significant client activities. Regular wellsprings of conduct information incorporate sites, versatile applications, CRM frameworks, promoting computerization frameworks, call focuses, help work areas, and charging frameworks. Clients can either be purchasers, organizations, or people inside a business. However, conduct information can generally be tied back to a solitary end-client. Note that this client can be a known individual (signed in) or unknown (not signed in). Complex practices are broadly observed in fake and characteristic insightful frameworks, on the web, social and online systems, multi-operator frameworks, and mental frameworks. The inside and out comprehension of complex practices has been progressively perceived as a pivotal method for uncovering inside main impetuses, causes, and effects on organizations in taking care of many testing issues. Notwithstanding, customary conduct demonstrating primarily depends on subjective techniques from conduct science and sociology points of view. The purported conduct examination in information investigation and adapting regularly centers around human segment and business use Information, in which conduct situated components are covered up in regularly gathered value-based Information. Subsequently, it is inadequate or even difficult to profoundly investigate local conduct expectations, lifecycles, elements, and effects on complex issues and business issues.
Keywords: Data mining, knowledge discovery, web indexes, complex datasets, high-dimensional information, data organizations, data filtering, fleeting information
In simple words, data mining is defined as a process often used to replace valuable data from a broad array of raw data. It suggests metadata design ideas in enormous data groupings using at least one computing. Data mining applies in different fields related to scientific facts and assessment. With mining techniques, organizations could even familiarize themselves with their customers and develop more successful processes recognized with different market capacities, thus influencing assets in a more ideal and adroit way. This makes organizations closer to their goal and better choices. Data mining techniques contain feasible information assortment and storage almost as Console preparation. To deform data and predict the risks of future occasions, information mining uses advanced quantitative measurements. Data mining is also known as Knowledge Discovery in Data (KDD).
With huge Information right now accessible and being gathered, acquiring admittance to Information is only occasionally the worry. Data is being created and put away at an exceptional rate, and progressively, a significant part of the large Information being gathered is about human conduct. This kind of Information is ordinarily made and put away as an “occasion,” which means a move that was made, with “properties,” which means meta-data used to depict the occasion. For instance, an occasion could be “site visit,” and property for that occasion could be “gadget type.” It might assist with considering occasions the “what” and the properties as the “who, when, and where.”
Our conduct is caught in the Data that we give from utilizing web indexes, e-business stages, informal community administrations, or online training. Filtering through this Information and determining bits of knowledge on human conduct empowers the stages to settle on more viable choices and offer better support. Nonetheless, customary conduct demonstrating depends on subjective strategies from conduct science and sociology viewpoints. There is an incredible requirement for computational models for assignments, for example, design examination, forecast, proposal, and abnormality recognition, on enormous scope datasets.
The information economy requires information mining to be more objective situated so more substantial outcomes can be created. This necessity infers that the semantics of the Information ought to be consolidated into the mining cycle. Information mining is prepared to manage this test since ongoing advancements in information mining have demonstrated an expanding enthusiasm for mining complex Information (as exemplified by chart mining, text mining, and so on). By consolidating the connections of the Information alongside the Information itself (instead of zeroing in on the Information alone), complex Information infuses semantics into the mining cycle, subsequently improving the capability of improving commitment to an information economy. Since the connections between the Information uncover certain social viewpoints hidden in the plain Information, this move of mining from straightforward Information to complex Information flags a key change to another phase in the exploration and practice of information disclosure, which can be named conduct mining. Conduct mining likewise has the capability of binding together some other ongoing exercises in information mining. We talk about significant viewpoints on conduct mining and examine its suggestions for the eventual fate of information mining.
This examination subject reports creative answers for issues of client conduct information scale in a wide scope of uses, for example, recommender frameworks and dubious conduct discovery. It covers information science and measurable ways to deal with information disclosure and demonstrating, choice help, and forecast, including AI and AI, on client conduct information. Potential settings incorporate Mining dynamic/streaming information, Mining diagram and system Information, Mining heterogeneous/multi-source information, Mining high dimensional information, Mining imbalanced information, Mining media information, Mining logical information, Mining successive information, Mining interpersonal organizations Mining spatial and transient Information.
An information stream is a succession of unbounded, constant information things with an extremely high information rate that can just peruse once by an application [1, 2]. Information stream investigation has, as of late, stood out in the exploration network. Calculations for mining information streams and progressing ventures in business and logical applications have been created and talked about in [3, 4]. The vast majority of these calculations center around creating estimated one-pass strategies is shown in Figure 2.1.
Figure 2.1 Process of mining data stream.
Two ongoing progressions propel the requirement for information stream handling frameworks [5, 6]:
I. The programmed age of an exceptionally nitty gritty, high information rate succession of information things in various logical and business applications. For instance: satellite, radar, and cosmic information streams for logical applications and securities exchange and exchange web log information streams for business applications.
II. The requirement for complex investigations of these rapid information streams, for example, grouping and exception location, arrangement, regular item sets, and checking continuous things.
There are two techniques for tending to the issue of the fast idea of information streams. Information and yield rate variation of the mining calculation is the primary procedure. The rate transformation implies controlling the information and yield pace of the mining calculation as indicated by the accessible assets. The calculation estimate by growing new light-weight strategies that have just one glance at every information thing is the subsequent system. The principal focal point of mining information stream methods proposed so far is the structure of surmised mining calculations that have just one disregard or less the information stream [7].
