114,99 €
An authoritative guide to an in-depth analysis of various state-of-the-art data clustering approaches using a range of computational intelligence techniques Recent Advances in Hybrid Metaheuristics for Data Clustering offers a guide to the fundamentals of various metaheuristics and their application to data clustering. Metaheuristics are designed to tackle complex clustering problems where classical clustering algorithms have failed to be either effective or efficient. The authors--noted experts on the topic--provide a text that can aid in the design and development of hybrid metaheuristics to be applied to data clustering. The book includes performance analysis of the hybrid metaheuristics in relationship to their conventional counterparts. In addition to providing a review of data clustering, the authors include in-depth analysis of different optimization algorithms. The text offers a step-by-step guide in the build-up of hybrid metaheuristics and to enhance comprehension. In addition, the book contains a range of real-life case studies and their applications. This important text: * Includes performance analysis of the hybrid metaheuristics as related to their conventional counterparts * Offers an in-depth analysis of a range of optimization algorithms * Highlights a review of data clustering * Contains a detailed overview of different standard metaheuristics in current use * Presents a step-by-step guide to the build-up of hybrid metaheuristics * Offers real-life case studies and applications Written for researchers, students and academics in computer science, mathematics, and engineering, Recent Advances in Hybrid Metaheuristics for Data Clustering provides a text that explores the current data clustering approaches using a range of computational intelligence techniques.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 337
Veröffentlichungsjahr: 2020
Cover
List of Contributors
Series Preface: Dr Siddhartha Bhattacharyya, Christ (Deemed To Be University), Bangalore, India (Series Editor)
Preface
1 Metaheuristic Algorithms in Fuzzy Clustering
1.1 Introduction
1.2 Fuzzy Clustering
1.3 Algorithm
1.4 Genetic Algorithm
1.5 Particle Swarm Optimization
1.6 Ant Colony Optimization
1.7 Artificial Bee Colony Algorithm
1.8 Local Search‐Based Metaheuristic Clustering Algorithms
1.9 Population‐Based Metaheuristic Clustering Algorithms
1.10 Conclusion
References
2 Hybrid Harmony Search Algorithm to Solve the Feature Selection for Data Mining Applications
2.1 Introduction
2.2 Research Framework
2.3 Text Preprocessing
2.4 Text Feature Selection
2.5 Harmony Search Algorithm
2.6 Text Clustering
2.7 ‐means text clustering algorithm
2.8 Experimental Results
2.9 Conclusion
References
3 Adaptive Position–Based Crossover in the Genetic Algorithm for Data Clustering
3.1 Introduction
3.2 Preliminaries
3.3 Related Works
3.4 Proposed Model
3.5 Experimentation
3.6 Conclusion
References
4 Application of Machine Learning in the Social Network
4.1 Introduction
4.2 Application of Classification Models in Social Networks
4.3 Application of Clustering Models in Social Networks
4.4 Application of Regression Models in Social Networks
4.5 Application of Evolutionary Computing and Deep Learning in Social Networks
4.6 Summary
Acknowledgments
References
5 Predicting Students' Grades Using CART, ID3, and Multiclass SVM Optimized by the Genetic Algorithm (GA): A Case Study
5.1 Introduction
5.2 Literature Review
5.3 Decision Tree Algorithms: ID3 and CART
5.4 Multiclass Support Vector Machines (SVMs) Optimized by the Genetic Algorithm (GA)
5.5 Preparation of Datasets
5.6 Experimental Results and Discussions
5.7 Conclusion
References
6 Cluster Analysis of Health Care Data Using Hybrid Nature‐Inspired Algorithms
6.1 Introduction
6.2 Related Work
6.3 Proposed Methodology
6.4 Results and Discussion
6.5 Conclusion
References
7 Performance Analysis Through a Metaheuristic Knowledge Engine
7.1 Introduction
7.2 Data Mining and Metaheuristics
7.3 Problem Description
7.4 Association Rule Learning
7.5 Literature Review
7.6 Methodology
7.7 Implementation
7.8 Performance Analysis
7.9 Research Contributions and Future Work
7.10 Conclusion
References
8 Magnetic Resonance Image Segmentation Using a Quantum‐Inspired Modified Genetic Algorithm (QIANA) Based on FRCM
8.1 Introduction
8.2 Literature Survey
8.3 Quantum Computing
8.4 Some Quality Evaluation Indices for Image Segmentation
8.5 Quantum‐Inspired Modified Genetic Algorithm (QIANA)–Based FRCM
8.6 Experimental Results and Discussion
8.7 Conclusion
References
9 A Hybrid Approach Using the
‐means and Genetic Algorithms for Image Color Quantization
9.1 Introduction
9.2 Background
9.3 Color Quantization Methodology
9.4 Results and Discussions
9.5 Conclusions and Future Work
Acknowledgments
References
Index
End User License Agreement
Chapter 2
Table 2.1 Feature selection solution representation
Table 2.2 Text Datasets Characteristics
Table 2.3 The Algorithm Efficacy Based on Clusters' Quality Results
Chapter 3
Table 3.1 Tabular representation for values of data1
Table 3.2 Comparison of one‐point and arithmetic crossover with proposed work...
Table 3.3 Comparison of one‐point and arithmetic crossover with proposed work...
Table 3.4 Comparison of one‐point and arithmetic crossover with proposed work...
Chapter 4
Table 4.1 Summary Classification Applications
Table 4.2 Summary of Clustering Applications
Table 4.3 Summary Regression Application
Chapter 5
Table 5.1 Classification of Student Grades
Table 5.2 Binary Dataset
Table 5.3 Multiclass Dataset
Chapter 6
Table 6.1 Summary/Gaps Identified in the Survey
Chapter 7
Table 7.1 Technical Scenario
Table 7.2 Indicator Matrix
Table 7.3 Association Rules (By Market Basket Analysis)
Chapter 8
Table 8.1 Class Boundaries and Evaluated Segmentation Quality Measures,
F(I)
b...
Table 8.2 Class Boundaries and Evaluated Segmentation Quality Measures,
F
'(
I
) ...
Table 8.3 Class Boundaries and Evaluated Segmentation Quality Measures,
Q
(
I
) b...
Table 8.4 Different Algorithm Based Mean and Standard Deviation Using Differe...
Table 8.5 Class boundaries and Evaluated Segmentation Quality Measures,
F
(
I
) b...
Table 8.6 Class Boundaries and Evaluated Segmentation Quality Measures,
F
'(
I
) ...
Table 8.7 Class Boundaries and Evaluated Segmentation Quality Measures,
Q
(
I
) b...
Table 8.8 Different Algorithm‐Based Mean and Standard Deviation Using Differe...
Table 8.9 Single ANOVA Analysis Based on
Q
(
I
) for MR image1
Table 8.10 Single ANOVA Analysis Based on
Q
(
I
) for MR image2
Chapter 9
Table 9.1 Results of SSIM for Three Executions (Mean and Standard Deviations ...
Table 9.2 Results of SSIM for 3 Executions (Mean and Standard Deviations Are ...
Table 9.3 Results of SSIM for 3 Executions (Mean and Standard Deviations Are ...
Chapter 2
Figure 2.1 Research framework of the proposed hybrid method
Figure 2.2 The accuracy of the
‐means text clustering methods
Figure 2.3 The F‐measure score of the
‐means technique
Chapter 3
Figure 3.1 Flowchart for performing crossover for parent 1.
Figure 3.2 Flowchart for performing crossover for parent 2.
Figure 3.3 Flowchart for selecting better offspring.
Figure 3.4 Bar chart for DB Index for Table 3.2, where the number of cluster...
Figure 3.5 Bar chart for intra‐cluster distance for Table 3.2, where the num...
Figure 3.6 Bar chart for inter‐cluster distance for Table 3.2, where the num...
Figure 3.7 Bar chart for DB Index for Table 3.3 where number of clusters=4, ...
Figure 3.8 Bar chart for intra‐cluster distance for Table 3.3 where number o...
Figure 3.9 Bar chart for inter‐cluster distance for Table 3.3 where number o...
Figure 3.10 Bar chart for DB Index for Table 3.4 where number of clusters=20...
Figure 3.11 Bar chart for intra‐cluster distance for Table 3.4 where number ...
Figure 3.12 Bar chart for inter‐cluster distance for Table 3.4 where number ...
Chapter 4
Figure 4.1 Classification of machine learning algorithms
Figure 4.2 Workflow of big data, machine learning, and social media
Figure 4.3 Chatbot schematic diagram
Figure 4.4 Clustering in the network data using a word adjacency dataset
Chapter 5
Figure 5.1 Linear separation of two classes
and
in two‐dimensional space...
Figure 5.2 Multiclass support vector machine
Figure 5.3 SVM optimized by genetic algorithms
Figure 5.4 Bar graph showing accuracy of CART and ID3 on binary dataset
Figure 5.5 Bar graph showing accuracy of CART, ID3, and SVM on multiclass da...
Figure 5.6 Bar graph showing accuracy of different SVM kernels on multiclass...
Chapter 6
Figure 6.1 Flow diagram of the firefly algorithm
Figure 6.2 Flow diagram of the
k
‐means algorithm
Figure 6.3 Proposed methodology
Figure 6.4
k
‐means firefly algorithm pseudocode
Figure 6.5 Circles cluster after
k
‐means firefly algorithm
Figure 6.6 Diabetes Davies‐Bouldin graph before verses after
Figure 6.7 Iris Davies‐Bouldin graph before versus after
Figure 6.8 Diabetes dataset Davies‐Bouldin index
Figure 6.9 Iris dataset Davies‐Bouldin index
Chapter 7
Figure 7.1 Knowledge discovery paradigm
Chapter 8
Figure 8.1 Flowchart of QIANA‐based FRCM
Figure 8.2 (a) MR image 1; (b) MR image 2.
Figure 8.3 Six‐class segmented
grayscale MR image1 with the class levels o...
Figure 8.4 Six‐class segmented
grayscale MR image2 with the class levels o...
Chapter 9
Figure 9.1 Main steps of the hybrid method of image color quantization based...
Figure 9.2 Graphical representation of the crossover operator.
Figure 9.3 Images and their sizes used in the experiments to evaluate our co...
Figure 9.4 Results for
‐means and genetic algorithms on “lena” and “peppers...
Figure 9.5 Results for the “fruits” image with
.
Figure 9.6 Zooming of the results for the “fruits” image with
.
Figure 9.7 Results for the “lena” image with
.
Figure 9.8 Results for the “rgb” image with
.
Figure 9.9 Results for the “girl” image with
.
Figure 9.10 Results with zooming for the “tulips” image with
.
Figure 9.11 Comparative graphics of the results obtained for each of the ima...
Figure 9.12 Comparative graphics of the results obtained for each of the ima...
Figure 9.13 Comparative graphics of the results obtained for each of the ima...
Cover
Table of Contents
Begin Reading
iv
v
xiii
xiv
xv
xvi
xvii
xviii
xix
xx
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
101
102
103
104
105
106
107
108
109
110
111
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
173
174
175
176
177
Edited bySourav DeCooch Behar Government Engineering College, West Bengal, India
Sandip DeySukanta Mahavidyalaya, West Bengal, India
Siddhartha BhattacharyyaCHRIST (Deemed to be University), Bangalore, India
This edition first published 2020
© 2020 John Wiley & Sons Ltd
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.
The right of Sourav De, Sandip Dey, and Siddhartha Bhattacharyya to be identified as the authors of the editorial material in this work has been asserted in accordance with law.
Registered Offices
John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA
John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK
Editorial Office
The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK
For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.
Wiley also publishes its books in a variety of electronic formats and by print‐on‐demand. Some content that appears in standard print versions of this book may not be available in other formats.
Limit of Liability/Disclaimer of Warranty
In view of ongoing research, equipment modifications, changes in governmental regulations, and the constant flow of information relating to the use of experimental reagents, equipment, and devices, the reader is urged to review and evaluate the information provided in the package insert or instructions for each chemical, piece of equipment, reagent, or device for, among other things, any changes in the instructions or indication of usage and for added warnings and precautions.
While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
Library of Congress Cataloging‐in‐Publication Data
Names: De, Sourav, 1979‐ editor. | Dey, Sandip, 1977‐ editor. |
Bhattacharyya, Siddhartha, 1975‐ editor.
Title: Recent advances in hybrid metaheuristics for data clustering / edited
by Dr. Sourav De, Dr. Sandip Dey, Dr. Siddhartha Bhattacharyya.
Description: First edition. | Hoboken, NJ : John Wiley & Sons, Inc., [2020]
| Includes bibliographical references and index.
Identifiers: LCCN 2020010571 (print) | LCCN 2020010572 (ebook) | ISBN
9781119551591 (cloth) | ISBN 9781119551614 (adobe pdf) | ISBN
9781119551607 (epub)
Subjects: LCSH: Cluster analysis–Data processing. | Metaheuristics.
Classification: LCC QA278.55 .R43 2020 (print) | LCC QA278.55 (ebook) |
DDC 519.5/3–dc23
LC record available at https://lccn.loc.gov/2020010571
LC ebook record available at https://lccn.loc.gov/2020010572
Cover Design: Wiley
Cover Image: © Nobi_Prizue/Getty Images
Dr. Sourav De dedicates this book to his respected parents, Satya Narayan De and Tapasi De; his loving wife, Debolina Ghosh; his beloved son, Aishik De; his sister, Soumi De, and his in‐laws.
Dr. Sandip Dey dedicates this book to the loving memory of his father, the late Dhananjoy Dey; his beloved mother, Smt. Gita Dey; his wife, Swagata Dey Sarkar; his children, Sunishka and Shriaan; his siblings, Kakali, Tanusree, and Sanjoy; and his nephews, Shreyash and Adrishaan.
Dr. Siddhartha Bhattacharyya dedicates this book to his late father, Ajit Kumar Bhattacharyya; his late mother, Hashi Bhattacharyya; his beloved wife, Rashni, and his in‐laws, Asis Mukherjee and Poly Mukherjee.
Laith Mohammad Abualigah
Amman Arab University
Jordan
Rishabh Agrawal
VIT
India
Kauser Ahmed
VIT
India
Mofleh Al‐diabat
Al Albayt University
Jordan
Bisan Alsalibi
Universiti Sains Malaysia
Malaysia
Mohammad Al Shinwan
Amman Arab University
Jordan
Belfin R V
Karunya Institute of Technology and Sciences
India
Siddhartha Bhattacharyya
CHRIST (Deemed to be university)
India
Indu Chhabra
Panjab University
Chandigarh
India
Sunanda Das
National Institute of Technology
Durgapur
India
Sourav De
Cooch Behar Government Engineering College
India
Prasenjit Dey
Cooch Behar Government Engineering College
India
Sandip Dey
Sukanta Mahavidyala
India
Tania Dey
Sikkim Manipal Institute of Technology
India
Khaldoon Dhou
Drury University
USA
Arnab Gain
Cooch Behar Government Engineering College
India
Essam Hanandeh
Zarqa university
Jordan
Grace Mary Kanaga
Karunya Institute of Technology and Sciences
India
Ahamad Khader
Universiti Sains Malaysia
Malaysia
Debanjan Konar
Sikkim Manipal Institute of Technology
India
Suman Kundu
Wroclaw University of Science and Technology
India
Ruchita Pradhan
Sikkim Manipal Institute of Technology
India
Helio Pedrini
Institute of Computing
University of Campinas
Brazil
Prativa Rai
Sikkim Manipal Institute of Technology
India
Marcos Roberto e Souza
Institute of Computing
University of Campinas
Campinas
Brazil
Essam Said Hanandeh
Zarqa University
Jordan
Anderson Santos
Institute of Computing
University of Campinas
Brazil
Tejaswini Sapkota
Sikkim Manipal Institute of Technology
India
Mohammad Shehab
Aqaba University of Technology
Jordan
Gunmala Suri
University Business School
Panjab University
Chandigarh
India
The Intelligent Signal and Data Processing (ISDP) book series focuses on the field of signal and data processing encompassing the theory and practice of algorithms and hardware that convert signals produced by artificial or natural means into a form useful for a specific purpose. The signals might be speech, audio, images, video, sensor data, telemetry, electrocardiograms, or seismic data, among others. The possible application areas include transmission, display, storage, interpretation, classification, segmentation, and diagnosis. The primary objective of the ISDP book series is to evolve future‐generation, scalable, intelligent systems for faithful analysis of signals and data. The ISDP series is intended mainly to enrich the scholarly discourse on intelligent signal and image processing in different incarnations. The series will benefit a wide audience that includes students, researchers, and practitioners. The student community can use the books in the series as reference texts to advance their knowledge base. In addition, the constituent monographs will be handy to aspiring researchers due to recent and valuable contributions in this field. Moreover, faculty members and data practitioners are likely to gain relevant knowledge from the books in the series.
The series coverage will contain, but not be exclusive to, the following:
Intelligent signal processing
Adaptive filtering
Learning algorithms for neural networks
Hybrid soft computing techniques
Spectrum estimation and modeling
Image processing
Image thresholding
Image restoration
Image compression
Image segmentation
Image quality evaluation
Computer vision and medical imaging
Image mining
Pattern recognition
Remote sensing imagery
Underwater image analysis
Gesture analysis
Human mind analysis
Multidimensional image analysis
Speech processing
Modeling
Compression
Speech recognition and analysis
Video processing
Video compression
Analysis and processing
3D video compression
Target tracking
Video surveillance
Automated and distributed crowd analytics
Stereo‐to‐auto stereoscopic 3D video conversion
Virtual and augmented reality
Data analysis
Intelligent data acquisition
Data mining
Exploratory data analysis
Modeling and algorithms
Big data analytics
Business intelligence
Smart cities and smart buildings
Multiway data analysis
Predictive analytics
Intelligent systems
Grouping or classifying real‐life data into a set of clusters or categories for further processing and classification is known as clustering. The groups are organized on the basis of built‐in properties or characteristics of the data in that dataset. The features of the groups are important to represent a new object or to understand a new phenomenon. Homogeneous data should be in the same cluster, whereas dissimilar or heterogeneous data is grouped into different clusters. The clustering of data can be applied in different fields of the world, such as document retrieval, data mining, pattern classification, image segmentation, artificial intelligence, machine learning, biology, microbiology, etc.
Broadly, there are two types of data clustering algorithms: supervised and unsupervised. In supervised data clustering algorithms, the number of coveted partitions and labeled datasets is supplied as the basic input at the beginning of the algorithm. Moreover, in supervised clustering algorithms, it is attempted to keep the number of segments small, and the data points are allotted to clusters using the idea of closeness by resorting to a given distance function. By contrast, prior information about the labeled classes, decision‐making criterion for optimization, or number of desired segments beyond the raw data or grouping of principle(s) on the basis of their data content are not required for the unsupervised algorithms.
Metaheuristic algorithms have proved efficient in handling and solving different types of data clustering problems. Metaheuristics is designed to tackle complex clustering problems where classical clustering algorithms fail to be either effective or efficient. Basically, the solving procedure of a subordinate heuristic problem by an iterative generation procedure is known as metaheuristic. This is done by syndicating intelligently different concepts to explore and exploit the search space, and the nonoptimal solutions are derived efficiently by the learning strategies that are applied on the structural information of the problem. The main objective of metaheuristic is to derive a set of optimal solutions large enough to be completely sampled. Different types of real‐world problems can be handled by the metaheuristic techniques because conventional algorithms can't manage many real‐world problems, in spite of increasing computational power, simply due to the unrealistically long running times. To solve the optimization problems, these algorithms make a few assumptions at the initial stages. It is not assured that metaheuristic algorithms will generate globally optimal solutions to solve all types of problems since most of the implementations are some form of stochastic optimization and the resultant solutions may depend on the set of generated random variables. To solve optimization algorithms, heuristics, or iterative methods, metaheuristic algorithms are the better option as they often determine good solutions with lesser computational effort by exploring a large set of feasible solutions. Some well‐known metaheuristic algorithms include the genetic algorithm (GA), simulated annealing (SA), tabu search (TS), and different types of swarm intelligence algorithms. Some recognized swarm intelligence algorithms are particle swarm optimization (PSO), ant colony optimization (ACO), artificial bee colony optimization (ABC), differential optimization (DE), cuckoo search algorithm, etc. In recent advancements of the research, some modern swarm intelligence–based optimization algorithms such as Egyptian vulture optimization algorithm, rats herd algorithm (RATHA), bat algorithm, crow search algorithm, glowworm swarm optimization (GSO), etc., are found to perform well when solving some real‐life problems. These algorithms also work efficiently to cluster different types of real‐life datasets.
During the clustering of data, it has been observed that the methaheuristic algorithms suffer from time complexity though they can afford optimum solutions. To get rid of these types of problems and not depend on a particular type of metaheuristic algorithm to solve complex problems, researchers and scientists blended not only different metaheuristic approaches but also hybridized different metaheuristic algorithms with other soft computing tools and techniques, such as neural network, fuzzy set, rough set, etc. The hybrid metaheuristic algorithms, a combination of metaheuristic algorithms and other techniques, are more effective at handling real‐life data clustering problems. Recently, quantum mechanical principles are also applied to cut down on the time complexity of the metaheuristic approaches to a great extent.
The book will entice readers to design efficient metaheuristics for data clustering in different domains. The book will elaborate on the fundamentals of different metaheuristics and their application to data clustering. As a sequel to this, it will pave the way for designing and developing hybrid metaheuristics to be applied to data clustering. It is not easy to find books on hybrid metaheuristic algorithms that cover this topic.
The book contains nine chapters written by the leading practitioners in the field.
A brief overview of the advantages and limitations of the fuzzy clustering algorithm is presented in Chapter 1. The principle of operation and the structure of fuzzy algorithms are also elucidated with reference to the inherent limitations of cluster centroid selection. Several local‐search‐based and population‐based metaheuristic algorithms are discussed with reference to their operating principles. Finally, different avenues for addressing the cluster centroid selection problem with recourse to the different metaheuristic algorithms are presented.
The increasing size of the data and text on electronic sites has necessitated the use of different clustering methods, including text clustering. This may be a helpful unsupervised analysis method used for partitioning the immense size of text documents into a set of groups. The feature choice may be a well‐known unsupervised methodology accustomed to eliminating uninformative options to enhance the performance of the text clustering method. In Chapter 2, the authors have a tendency to project a rule to resolve the featured choice drawback before applying the k‐means text clustering technique by rising the exploitation searchability of the fundamental harmony search algorithmic rule known as H‐HSA. The projected feature choice methodology is used in this chapter to reinforce the text clustering technique by offering a replacement set of informative features.
In the advancement of data analytics, data clustering has become one of the most important areas in modern data science. Several works have come up with various algorithms to deal with data clustering. In Chapter 3, the objective is to improve the data clustering by using metaheuristic‐based algorithms. For this purpose, the authors have proposed a genetic algorithm–based data clustering approach. Here, a new adaptive position–based crossover technique has been proposed for the genetic algorithm where the new concept of vital gene has been proposed during the crossover. The simulation results demonstrate that the proposed method performs better compared to the other two genetic algorithm–based data clustering methods. Furthermore, it has also been observed that the proposed approach is time efficient compared to its counterparts.
A social network, used by the human population as a platform of interaction, generates a large volume of diverse data every day. These data and attributes of the interactions become more and more critical for researchers and businesses to identify societal and economic values. However, the generated data is vast, highly complex, and dynamic, which necessitates a real‐time solution. Machine learning is a useful tool in order to summarize the meaningful information from large, diverse datasets. Chapter 4 provides a survey of several applications of social network analysis where machine learning plays a critical role. These applications range from spam content detection to human behavior analysis, from topic modeling to recommender systems, and from sentiment analysis to emotion contagion in social network.
Predicting students' performance at an earlier stage is important for improving their performance for higher education and placement opportunities. Early prediction of student grades allows an instructor to detect the students' poor performance in a course automatically and also provides enormous opportunities to the decision‐makers to take remedial measures to help the students to succeed in future education. A model predicting students' grades using CART, ID3, and improved multiclass SVM optimized by the genetic algorithm (GA) is investigated in Chapter 5. The model follows a supervised learning classification by means of CART, ID3, and SVM optimzed by the GA. In this study, the model is tested on a dataset that contains undergraduate student information, i.e., total marks obtained in the courses taken up in four years with the respective labeled subject name and a code at Sikkim Manipal Institute of Technology, Sikkm, India. A comparative analysis among CART, ID3, and multiclass SVM optimized by the GA indicates that the multiclass SVM optimized by GA outperforms ID3 and CART decision tree algorithms in the case of multiclass classification.
Significant advances in information technology result in the excessive growth of data in health care informatics. In today's world, development technologies are also being made to treat new types of diseases and illnesses, but no steps are being taken to stop the disease in its track in the early stages. The motivation of Chapter 6 is to help prepare people to diagnose the disease at early stages based on the symptoms of the disease. In this chapter, the authors have used various nature‐inspired clustering algorithms in collaboration with k‐means algorithms to actively cluster a person's health data with already available data and label it accordingly. Experiments results prove that nature‐inspired algorithms like firefly with k‐means are giving efficient result to the existing problems.
With the fast development of pattern discovery–oriented systems, data mining is rapidly intensifying in other disciplines of management, biomedical, and physical sciences to tackle the issues of data collection and data storage. With the advancement of data science, numerous knowledge‐oriented paradigms are evaluated for automatic rule mining. Association rule mining is an active research area with numerous algorithms used for knowledge accumulation. Chapter 7 focuses on handling the various challenging issues of only demand‐driven aggregation of information sources, mining and analyzing relevant patterns to preserve user concerns, and implementing the same association rule mining problem for multi‐objective solutions rather than as a single‐objective solution for post‐purchase customer analysis.
The GA and fuzzy c‐means (FRCM) algorithm is widely used in magnetic resonance image segmentation. In Chapter 8, a hybrid concept, quantum‐inspired modified GA and FRCM are used to segment MR images. The modified GA (MEGA) enhances the performance of the GA by modifying population initialization and crossover probability. To speed up this classical MEGA and also to derive more optimized class levels, some quantum computing characteristics like qubit, entanglement, orthogonality, rotational gate, etc., are incorporated into the classical MEGA. The class levels created by the quantum‐inspired MEGA are employed to the FRCM as initial input to overcome the convergence problem of the FRCM. A performance comparison using some standard evaluation metrics is delineated between quantum‐inspired MEGA‐based FRCM, classical MEGA‐based FRCM, and conventional FRCM with the help of two grayscale MR images, which shows the excellence of the proposed quantum‐inspired MEGA‐based FRCM over both the classical MEGA‐based FRCM and the conventional FRCM methods.
Large volumes of data have been rapidly collected due to the increasing advances in equipment and techniques for content acquisition. However, the efficient storage, indexing, retrieval, representation, and recognition of multimedia data, such as text, audio, images, and videos, are challenging tasks. To summarize the main characteristics of datasets and simplify their interpretation, exploratory data analysis is often applied to numerous problems in several fields, such as pattern recognition, computer vision, machine learning, and data mining. A common data analysis technique, commonly associated with descriptive statistics and visual methods, is cluster analysis or clustering. In Chapter 9, the authors propose a hybrid method based on k‐means and the genetic algorithm guided by a qualitative objective function. Experiments demonstrate good results of the proposed method.
The editors hope that this book will be helpful for students and researchers who are interested in this area. It can also prove to be a novel initiative for undergraduate students of computer science, information science, and electronics engineering for part of their curriculum.
October, 2019
Sourav DeCooch Behar, India
Sandip DeyJalpaiguri, India
Siddhartha BhattacharyyaBangalore, India
Sourav De1, Sandip Dey2, and Siddhartha Bhattacharyya3
1Department of Computer Science and Engineering, Cooch Behar Government Engineering College, India
2Department of Computer Science, Sukanta Mahavidyalaya, Jalpaiguri, India
3Department of Computer Science and Engineering, CHRIST (Deemed to be University), Bangalore, India
Fuzzy clustering refers to the process of assigning data points to different clusters based on the similarity/dissimilarity of features. This process ensures that items in the same cluster are as similar as possible, while dissimilar items belong to different clusters. The identification of the clusters and the assignment of items to clusters are decided with the help of several similarity measures, which include measures of distance, connectivity, and intensity. The choice of the similarity measures depends on the type of data or the application [1].
Both classical and new algorithms have evolved over the years to address the clustering problem. Notable among them are the ‐means [2] and fuzzy clustering [3,4]. The classical algorithms primarily segregate the data points into completely different clusters while ensuring that the dissimilarity between the different clusters and the similarity of the constituent data points within any cluster are maximized in the process. Thus, these algorithms ensure that there is no overlap between the clusters. However, fuzzy clustering relies on the soft meaning, thereby enabling overlapping between clusters with the constituent data points belonging to more than one cluster depending on a degree of belongingness.
The main limitation in any clustering algorithm lies in the initialization process, which entails an initial selection of cluster center points, which are chosen randomly in most cases. Hence, an improper initialization of the cluster centers may lead to an unacceptable result since the positions of the cluster centers, with respect to the constituent data points, are major concerns in the assignment of the data points to the cluster centers.
Fuzzy clustering, often referred to as soft clustering or soft ‐means, is a method that entails a soft distinction of the constituent data points. By contrast, in any non‐fuzzy/crisp clustering, each data point is designated to belong to exactly one and only one cluster with no overlapping of clusters. The data points, however, can belong to more than one cluster, implying that certain overlaps exist between resultant clusters. The underlying principle behind this partitional clustering technique is the concept of fuzzy soft set theory, which holds that for a given universe of discourse, every constituent element belongs to all the sets defined in the universe with a certain degree of belongingness (also referred to as membership) [3–5]. Fuzzy clustering is often treated as preferable due to the inherent advantages of having a natural affinity of incorporating larger datasets, a simple and straightforward implementation, the ability to handle large datasets as the time complexity is , the ability to produce very good results for hyper spherically shaped well‐separated clusters, being robust in design, and the ability to converge to a local optimal solution [1].
FCM clustering is one of the most widely used. It was developed by J.C. Dunn in 1973 [6] and improved by J.C. Bezdek in 1981 [1]. The operation of the algorithm is quite similar to the widely known ‐means algorithm. The basic steps are as follows:
Select a number of clusters.
Randomly assign coefficients to each data point to label them to the clusters.
Repeat until the algorithm converges, i.e., when the change in the coefficients in two consecutive iterations is no more than a predefined threshold
.
Compute the cluster centroids for each cluster. Every data point
is identified by a set of coefficients indicating the degree of belongingness to the
cluster
. In FCM, the mean of all the participating points weighted by their degree of belongingness to the cluster represents the cluster centroid. It is mathematically given as
where
is a hyper‐parameter that controls the fuzzyness of the clusters. The higher
is, the fuzzier the cluster will be in the end.
For each data point, compute its coefficients of belongingness in the clusters.
An algorithm attempts to partition a finite collection of elements into a collection of fuzzy clusters with respect to some given criterion. Given a finite set of data, the algorithm returns a list of cluster centers and a partition matrix , where each element tells the degree to which element belongs to cluster .
The aim is to minimize an objective function in the following form:
where:
‐means clustering works along similar lines. However, it differs from the ‐means objective function by the presence of (or the cluster fuzziness) determined by the fuzzifier, , with . A large results in smaller membership values, , and, hence, fuzzier clusters. In the limit , the memberships, , converge to 0 or 1, which implies a crisp partitioning. is commonly set to 2. The algorithm also minimizes the intracluster variance that often leads to a local minimum. Moreover, the clustering results depend on the initial choice of weights. Fuzzy clustering suffers from the fact that the number of clusters in the given dataset should be known beforehand. It is also sensitive to noise and outliers.
Most of the clustering algorithms require an initial selection of cluster centroids (which is often made in a random fashion) without exception. In fact, the selection of the initial cluster center values is considered one of the most challenging tasks in partitional clustering algorithms. Incorrect selection of initial cluster center values leads the searching process toward an optimal solution that gets often stuck in a local optima yielding undesirable clustering results [7,8]. The primary cause behind this problem lies in the fact that the clustering algorithms run in a manner similar to the hill climbing algorithm [9], which, being a local search‐based algorithm, moves in one direction without performing a wider scan of the search space to minimize (or maximize) the objective function. This behavior prevents the algorithm to explore other regions in the search space that might have a better, or even the desired, solution. Thus, proper exploitation and exploration of the search space are not effected in the running of these algorithms.
The general approach to alleviate this problem is to rerun the algorithm several times with several cluster initializations. However, this approach is not always feasible, especially when it comes to the clustering of a large dataset or complex dataset (i.e., a dataset with multiple optima) [10]. Thus, this section mechanism may be incarnated as a global optimization problem calling for the help of optimization algorithms.
Several global‐based search algorithms have been proposed to solve this local‐search problem [11]. These algorithms include both local search‐based metaheuristic algorithms such as SA, TS, or such as EAs (including EP, ES, GAs, and DE), HS or such as PSO, ABC, and ACO. The following sections provide an overview of the algorithms proposed to solve the clustering problem where the clusters number is known or set up a priori.
