Recent Advances in Hybrid Metaheuristics for Data Clustering -  - E-Book

Recent Advances in Hybrid Metaheuristics for Data Clustering E-Book

0,0
114,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

An authoritative guide to an in-depth analysis of various state-of-the-art data clustering approaches using a range of computational intelligence techniques Recent Advances in Hybrid Metaheuristics for Data Clustering offers a guide to the fundamentals of various metaheuristics and their application to data clustering. Metaheuristics are designed to tackle complex clustering problems where classical clustering algorithms have failed to be either effective or efficient. The authors--noted experts on the topic--provide a text that can aid in the design and development of hybrid metaheuristics to be applied to data clustering. The book includes performance analysis of the hybrid metaheuristics in relationship to their conventional counterparts. In addition to providing a review of data clustering, the authors include in-depth analysis of different optimization algorithms. The text offers a step-by-step guide in the build-up of hybrid metaheuristics and to enhance comprehension. In addition, the book contains a range of real-life case studies and their applications. This important text: * Includes performance analysis of the hybrid metaheuristics as related to their conventional counterparts * Offers an in-depth analysis of a range of optimization algorithms * Highlights a review of data clustering * Contains a detailed overview of different standard metaheuristics in current use * Presents a step-by-step guide to the build-up of hybrid metaheuristics * Offers real-life case studies and applications Written for researchers, students and academics in computer science, mathematics, and engineering, Recent Advances in Hybrid Metaheuristics for Data Clustering provides a text that explores the current data clustering approaches using a range of computational intelligence techniques.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 337

Veröffentlichungsjahr: 2020

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

List of Contributors

Series Preface: Dr Siddhartha Bhattacharyya, Christ (Deemed To Be University), Bangalore, India (Series Editor)

Preface

1 Metaheuristic Algorithms in Fuzzy Clustering

1.1 Introduction

1.2 Fuzzy Clustering

1.3 Algorithm

1.4 Genetic Algorithm

1.5 Particle Swarm Optimization

1.6 Ant Colony Optimization

1.7 Artificial Bee Colony Algorithm

1.8 Local Search‐Based Metaheuristic Clustering Algorithms

1.9 Population‐Based Metaheuristic Clustering Algorithms

1.10 Conclusion

References

2 Hybrid Harmony Search Algorithm to Solve the Feature Selection for Data Mining Applications

2.1 Introduction

2.2 Research Framework

2.3 Text Preprocessing

2.4 Text Feature Selection

2.5 Harmony Search Algorithm

2.6 Text Clustering

2.7 ‐means text clustering algorithm

2.8 Experimental Results

2.9 Conclusion

References

3 Adaptive Position–Based Crossover in the Genetic Algorithm for Data Clustering

3.1 Introduction

3.2 Preliminaries

3.3 Related Works

3.4 Proposed Model

3.5 Experimentation

3.6 Conclusion

References

4 Application of Machine Learning in the Social Network

4.1 Introduction

4.2 Application of Classification Models in Social Networks

4.3 Application of Clustering Models in Social Networks

4.4 Application of Regression Models in Social Networks

4.5 Application of Evolutionary Computing and Deep Learning in Social Networks

4.6 Summary

Acknowledgments

References

5 Predicting Students' Grades Using CART, ID3, and Multiclass SVM Optimized by the Genetic Algorithm (GA): A Case Study

5.1 Introduction

5.2 Literature Review

5.3 Decision Tree Algorithms: ID3 and CART

5.4 Multiclass Support Vector Machines (SVMs) Optimized by the Genetic Algorithm (GA)

5.5 Preparation of Datasets

5.6 Experimental Results and Discussions

5.7 Conclusion

References

6 Cluster Analysis of Health Care Data Using Hybrid Nature‐Inspired Algorithms

6.1 Introduction

6.2 Related Work

6.3 Proposed Methodology

6.4 Results and Discussion

6.5 Conclusion

References

7 Performance Analysis Through a Metaheuristic Knowledge Engine

7.1 Introduction

7.2 Data Mining and Metaheuristics

7.3 Problem Description

7.4 Association Rule Learning

7.5 Literature Review

7.6 Methodology

7.7 Implementation

7.8 Performance Analysis

7.9 Research Contributions and Future Work

7.10 Conclusion

References

8 Magnetic Resonance Image Segmentation Using a Quantum‐Inspired Modified Genetic Algorithm (QIANA) Based on FRCM

8.1 Introduction

8.2 Literature Survey

8.3 Quantum Computing

8.4 Some Quality Evaluation Indices for Image Segmentation

8.5 Quantum‐Inspired Modified Genetic Algorithm (QIANA)–Based FRCM

8.6 Experimental Results and Discussion

8.7 Conclusion

References

9 A Hybrid Approach Using the

‐means and Genetic Algorithms for Image Color Quantization

9.1 Introduction

9.2 Background

9.3 Color Quantization Methodology

9.4 Results and Discussions

9.5 Conclusions and Future Work

Acknowledgments

References

Index

End User License Agreement

List of Tables

Chapter 2

Table 2.1 Feature selection solution representation

Table 2.2 Text Datasets Characteristics

Table 2.3 The Algorithm Efficacy Based on Clusters' Quality Results

Chapter 3

Table 3.1 Tabular representation for values of data1

Table 3.2 Comparison of one‐point and arithmetic crossover with proposed work...

Table 3.3 Comparison of one‐point and arithmetic crossover with proposed work...

Table 3.4 Comparison of one‐point and arithmetic crossover with proposed work...

Chapter 4

Table 4.1 Summary Classification Applications

Table 4.2 Summary of Clustering Applications

Table 4.3 Summary Regression Application

Chapter 5

Table 5.1 Classification of Student Grades

Table 5.2 Binary Dataset

Table 5.3 Multiclass Dataset

Chapter 6

Table 6.1 Summary/Gaps Identified in the Survey

Chapter 7

Table 7.1 Technical Scenario

Table 7.2 Indicator Matrix

Table 7.3 Association Rules (By Market Basket Analysis)

Chapter 8

Table 8.1 Class Boundaries and Evaluated Segmentation Quality Measures,

F(I)

b...

Table 8.2 Class Boundaries and Evaluated Segmentation Quality Measures,

F

'(

I

) ...

Table 8.3 Class Boundaries and Evaluated Segmentation Quality Measures,

Q

(

I

) b...

Table 8.4 Different Algorithm Based Mean and Standard Deviation Using Differe...

Table 8.5 Class boundaries and Evaluated Segmentation Quality Measures,

F

(

I

) b...

Table 8.6 Class Boundaries and Evaluated Segmentation Quality Measures,

F

'(

I

) ...

Table 8.7 Class Boundaries and Evaluated Segmentation Quality Measures,

Q

(

I

) b...

Table 8.8 Different Algorithm‐Based Mean and Standard Deviation Using Differe...

Table 8.9 Single ANOVA Analysis Based on

Q

(

I

) for MR image1

Table 8.10 Single ANOVA Analysis Based on

Q

(

I

) for MR image2

Chapter 9

Table 9.1 Results of SSIM for Three Executions (Mean and Standard Deviations ...

Table 9.2 Results of SSIM for 3 Executions (Mean and Standard Deviations Are ...

Table 9.3 Results of SSIM for 3 Executions (Mean and Standard Deviations Are ...

List of Illustrations

Chapter 2

Figure 2.1 Research framework of the proposed hybrid method

Figure 2.2 The accuracy of the

‐means text clustering methods

Figure 2.3 The F‐measure score of the

‐means technique

Chapter 3

Figure 3.1 Flowchart for performing crossover for parent 1.

Figure 3.2 Flowchart for performing crossover for parent 2.

Figure 3.3 Flowchart for selecting better offspring.

Figure 3.4 Bar chart for DB Index for Table 3.2, where the number of cluster...

Figure 3.5 Bar chart for intra‐cluster distance for Table 3.2, where the num...

Figure 3.6 Bar chart for inter‐cluster distance for Table 3.2, where the num...

Figure 3.7 Bar chart for DB Index for Table 3.3 where number of clusters=4, ...

Figure 3.8 Bar chart for intra‐cluster distance for Table 3.3 where number o...

Figure 3.9 Bar chart for inter‐cluster distance for Table 3.3 where number o...

Figure 3.10 Bar chart for DB Index for Table 3.4 where number of clusters=20...

Figure 3.11 Bar chart for intra‐cluster distance for Table 3.4 where number ...

Figure 3.12 Bar chart for inter‐cluster distance for Table 3.4 where number ...

Chapter 4

Figure 4.1 Classification of machine learning algorithms

Figure 4.2 Workflow of big data, machine learning, and social media

Figure 4.3 Chatbot schematic diagram

Figure 4.4 Clustering in the network data using a word adjacency dataset

Chapter 5

Figure 5.1 Linear separation of two classes

and

in two‐dimensional space...

Figure 5.2 Multiclass support vector machine

Figure 5.3 SVM optimized by genetic algorithms

Figure 5.4 Bar graph showing accuracy of CART and ID3 on binary dataset

Figure 5.5 Bar graph showing accuracy of CART, ID3, and SVM on multiclass da...

Figure 5.6 Bar graph showing accuracy of different SVM kernels on multiclass...

Chapter 6

Figure 6.1 Flow diagram of the firefly algorithm

Figure 6.2 Flow diagram of the

k

‐means algorithm

Figure 6.3 Proposed methodology

Figure 6.4

k

‐means firefly algorithm pseudocode

Figure 6.5 Circles cluster after

k

‐means firefly algorithm

Figure 6.6 Diabetes Davies‐Bouldin graph before verses after

Figure 6.7 Iris Davies‐Bouldin graph before versus after

Figure 6.8 Diabetes dataset Davies‐Bouldin index

Figure 6.9 Iris dataset Davies‐Bouldin index

Chapter 7

Figure 7.1 Knowledge discovery paradigm

Chapter 8

Figure 8.1 Flowchart of QIANA‐based FRCM

Figure 8.2 (a) MR image 1; (b) MR image 2.

Figure 8.3 Six‐class segmented

grayscale MR image1 with the class levels o...

Figure 8.4 Six‐class segmented

grayscale MR image2 with the class levels o...

Chapter 9

Figure 9.1 Main steps of the hybrid method of image color quantization based...

Figure 9.2 Graphical representation of the crossover operator.

Figure 9.3 Images and their sizes used in the experiments to evaluate our co...

Figure 9.4 Results for

‐means and genetic algorithms on “lena” and “peppers...

Figure 9.5 Results for the “fruits” image with

.

Figure 9.6 Zooming of the results for the “fruits” image with

.

Figure 9.7 Results for the “lena” image with

.

Figure 9.8 Results for the “rgb” image with

.

Figure 9.9 Results for the “girl” image with

.

Figure 9.10 Results with zooming for the “tulips” image with

.

Figure 9.11 Comparative graphics of the results obtained for each of the ima...

Figure 9.12 Comparative graphics of the results obtained for each of the ima...

Figure 9.13 Comparative graphics of the results obtained for each of the ima...

Guide

Cover

Table of Contents

Begin Reading

Pages

iv

v

xiii

xiv

xv

xvi

xvii

xviii

xix

xx

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

101

102

103

104

105

106

107

108

109

110

111

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

173

174

175

176

177

Recent Advances in Hybrid Metaheuristics for Data Clustering

Edited bySourav DeCooch Behar Government Engineering College, West Bengal, India

Sandip DeySukanta Mahavidyalaya, West Bengal, India

Siddhartha BhattacharyyaCHRIST (Deemed to be University), Bangalore, India

 

 

 

 

 

 

This edition first published 2020

© 2020 John Wiley & Sons Ltd

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.

The right of Sourav De, Sandip Dey, and Siddhartha Bhattacharyya to be identified as the authors of the editorial material in this work has been asserted in accordance with law.

Registered Offices

John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA

John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK

Editorial Office

The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK

For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.

Wiley also publishes its books in a variety of electronic formats and by print‐on‐demand. Some content that appears in standard print versions of this book may not be available in other formats.

Limit of Liability/Disclaimer of Warranty

In view of ongoing research, equipment modifications, changes in governmental regulations, and the constant flow of information relating to the use of experimental reagents, equipment, and devices, the reader is urged to review and evaluate the information provided in the package insert or instructions for each chemical, piece of equipment, reagent, or device for, among other things, any changes in the instructions or indication of usage and for added warnings and precautions.

While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

Library of Congress Cataloging‐in‐Publication Data

Names: De, Sourav, 1979‐ editor. | Dey, Sandip, 1977‐ editor. |

  Bhattacharyya, Siddhartha, 1975‐ editor.

Title: Recent advances in hybrid metaheuristics for data clustering / edited

  by Dr. Sourav De, Dr. Sandip Dey, Dr. Siddhartha Bhattacharyya.

Description: First edition. | Hoboken, NJ : John Wiley & Sons, Inc., [2020]

  | Includes bibliographical references and index.

Identifiers: LCCN 2020010571 (print) | LCCN 2020010572 (ebook) | ISBN

  9781119551591 (cloth) | ISBN 9781119551614 (adobe pdf) | ISBN

  9781119551607 (epub)

Subjects: LCSH: Cluster analysis–Data processing. | Metaheuristics.

Classification: LCC QA278.55 .R43 2020 (print) | LCC QA278.55 (ebook) |

  DDC 519.5/3–dc23

LC record available at https://lccn.loc.gov/2020010571

LC ebook record available at https://lccn.loc.gov/2020010572

Cover Design: Wiley

Cover Image: © Nobi_Prizue/Getty Images

Dr. Sourav De dedicates this book to his respected parents, Satya Narayan De and Tapasi De; his loving wife, Debolina Ghosh; his beloved son, Aishik De; his sister, Soumi De, and his in‐laws.

Dr. Sandip Dey dedicates this book to the loving memory of his father, the late Dhananjoy Dey; his beloved mother, Smt. Gita Dey; his wife, Swagata Dey Sarkar; his children, Sunishka and Shriaan; his siblings, Kakali, Tanusree, and Sanjoy; and his nephews, Shreyash and Adrishaan.

Dr. Siddhartha Bhattacharyya dedicates this book to his late father, Ajit Kumar Bhattacharyya; his late mother, Hashi Bhattacharyya; his beloved wife, Rashni, and his in‐laws, Asis Mukherjee and Poly Mukherjee.

List of Contributors

Laith Mohammad Abualigah

Amman Arab University

Jordan

Rishabh Agrawal

VIT

India

Kauser Ahmed

VIT

India

Mofleh Al‐diabat

Al Albayt University

Jordan

Bisan Alsalibi

Universiti Sains Malaysia

Malaysia

Mohammad Al Shinwan

Amman Arab University

Jordan

Belfin R V

Karunya Institute of Technology and Sciences

India

Siddhartha Bhattacharyya

CHRIST (Deemed to be university)

India

Indu Chhabra

Panjab University

Chandigarh

India

Sunanda Das

National Institute of Technology

Durgapur

India

Sourav De

Cooch Behar Government Engineering College

India

Prasenjit Dey

Cooch Behar Government Engineering College

India

Sandip Dey

Sukanta Mahavidyala

India

Tania Dey

Sikkim Manipal Institute of Technology

India

Khaldoon Dhou

Drury University

USA

Arnab Gain

Cooch Behar Government Engineering College

India

Essam Hanandeh

Zarqa university

Jordan

Grace Mary Kanaga

Karunya Institute of Technology and Sciences

India

Ahamad Khader

Universiti Sains Malaysia

Malaysia

Debanjan Konar

Sikkim Manipal Institute of Technology

India

Suman Kundu

Wroclaw University of Science and Technology

India

Ruchita Pradhan

Sikkim Manipal Institute of Technology

India

Helio Pedrini

Institute of Computing

University of Campinas

Brazil

Prativa Rai

Sikkim Manipal Institute of Technology

India

Marcos Roberto e Souza

Institute of Computing

University of Campinas

Campinas

Brazil

Essam Said Hanandeh

Zarqa University

Jordan

Anderson Santos

Institute of Computing

University of Campinas

Brazil

Tejaswini Sapkota

Sikkim Manipal Institute of Technology

India

Mohammad Shehab

Aqaba University of Technology

Jordan

Gunmala Suri

University Business School

Panjab University

Chandigarh

India

Series Preface: Dr Siddhartha Bhattacharyya, Christ (Deemed To Be University), Bangalore, India (Series Editor)

The Intelligent Signal and Data Processing (ISDP) book series focuses on the field of signal and data processing encompassing the theory and practice of algorithms and hardware that convert signals produced by artificial or natural means into a form useful for a specific purpose. The signals might be speech, audio, images, video, sensor data, telemetry, electrocardiograms, or seismic data, among others. The possible application areas include transmission, display, storage, interpretation, classification, segmentation, and diagnosis. The primary objective of the ISDP book series is to evolve future‐generation, scalable, intelligent systems for faithful analysis of signals and data. The ISDP series is intended mainly to enrich the scholarly discourse on intelligent signal and image processing in different incarnations. The series will benefit a wide audience that includes students, researchers, and practitioners. The student community can use the books in the series as reference texts to advance their knowledge base. In addition, the constituent monographs will be handy to aspiring researchers due to recent and valuable contributions in this field. Moreover, faculty members and data practitioners are likely to gain relevant knowledge from the books in the series.

The series coverage will contain, but not be exclusive to, the following:

Intelligent signal processing

Adaptive filtering

Learning algorithms for neural networks

Hybrid soft computing techniques

Spectrum estimation and modeling

Image processing

Image thresholding

Image restoration

Image compression

Image segmentation

Image quality evaluation

Computer vision and medical imaging

Image mining

Pattern recognition

Remote sensing imagery

Underwater image analysis

Gesture analysis

Human mind analysis

Multidimensional image analysis

Speech processing

Modeling

Compression

Speech recognition and analysis

Video processing

Video compression

Analysis and processing

3D video compression

Target tracking

Video surveillance

Automated and distributed crowd analytics

Stereo‐to‐auto stereoscopic 3D video conversion

Virtual and augmented reality

Data analysis

Intelligent data acquisition

Data mining

Exploratory data analysis

Modeling and algorithms

Big data analytics

Business intelligence

Smart cities and smart buildings

Multiway data analysis

Predictive analytics

Intelligent systems

Preface

Grouping or classifying real‐life data into a set of clusters or categories for further processing and classification is known as clustering. The groups are organized on the basis of built‐in properties or characteristics of the data in that dataset. The features of the groups are important to represent a new object or to understand a new phenomenon. Homogeneous data should be in the same cluster, whereas dissimilar or heterogeneous data is grouped into different clusters. The clustering of data can be applied in different fields of the world, such as document retrieval, data mining, pattern classification, image segmentation, artificial intelligence, machine learning, biology, microbiology, etc.

Broadly, there are two types of data clustering algorithms: supervised and unsupervised. In supervised data clustering algorithms, the number of coveted partitions and labeled datasets is supplied as the basic input at the beginning of the algorithm. Moreover, in supervised clustering algorithms, it is attempted to keep the number of segments small, and the data points are allotted to clusters using the idea of closeness by resorting to a given distance function. By contrast, prior information about the labeled classes, decision‐making criterion for optimization, or number of desired segments beyond the raw data or grouping of principle(s) on the basis of their data content are not required for the unsupervised algorithms.

Metaheuristic algorithms have proved efficient in handling and solving different types of data clustering problems. Metaheuristics is designed to tackle complex clustering problems where classical clustering algorithms fail to be either effective or efficient. Basically, the solving procedure of a subordinate heuristic problem by an iterative generation procedure is known as metaheuristic. This is done by syndicating intelligently different concepts to explore and exploit the search space, and the nonoptimal solutions are derived efficiently by the learning strategies that are applied on the structural information of the problem. The main objective of metaheuristic is to derive a set of optimal solutions large enough to be completely sampled. Different types of real‐world problems can be handled by the metaheuristic techniques because conventional algorithms can't manage many real‐world problems, in spite of increasing computational power, simply due to the unrealistically long running times. To solve the optimization problems, these algorithms make a few assumptions at the initial stages. It is not assured that metaheuristic algorithms will generate globally optimal solutions to solve all types of problems since most of the implementations are some form of stochastic optimization and the resultant solutions may depend on the set of generated random variables. To solve optimization algorithms, heuristics, or iterative methods, metaheuristic algorithms are the better option as they often determine good solutions with lesser computational effort by exploring a large set of feasible solutions. Some well‐known metaheuristic algorithms include the genetic algorithm (GA), simulated annealing (SA), tabu search (TS), and different types of swarm intelligence algorithms. Some recognized swarm intelligence algorithms are particle swarm optimization (PSO), ant colony optimization (ACO), artificial bee colony optimization (ABC), differential optimization (DE), cuckoo search algorithm, etc. In recent advancements of the research, some modern swarm intelligence–based optimization algorithms such as Egyptian vulture optimization algorithm, rats herd algorithm (RATHA), bat algorithm, crow search algorithm, glowworm swarm optimization (GSO), etc., are found to perform well when solving some real‐life problems. These algorithms also work efficiently to cluster different types of real‐life datasets.

During the clustering of data, it has been observed that the methaheuristic algorithms suffer from time complexity though they can afford optimum solutions. To get rid of these types of problems and not depend on a particular type of metaheuristic algorithm to solve complex problems, researchers and scientists blended not only different metaheuristic approaches but also hybridized different metaheuristic algorithms with other soft computing tools and techniques, such as neural network, fuzzy set, rough set, etc. The hybrid metaheuristic algorithms, a combination of metaheuristic algorithms and other techniques, are more effective at handling real‐life data clustering problems. Recently, quantum mechanical principles are also applied to cut down on the time complexity of the metaheuristic approaches to a great extent.

The book will entice readers to design efficient metaheuristics for data clustering in different domains. The book will elaborate on the fundamentals of different metaheuristics and their application to data clustering. As a sequel to this, it will pave the way for designing and developing hybrid metaheuristics to be applied to data clustering. It is not easy to find books on hybrid metaheuristic algorithms that cover this topic.

The book contains nine chapters written by the leading practitioners in the field.

A brief overview of the advantages and limitations of the fuzzy clustering algorithm is presented in Chapter 1. The principle of operation and the structure of fuzzy algorithms are also elucidated with reference to the inherent limitations of cluster centroid selection. Several local‐search‐based and population‐based metaheuristic algorithms are discussed with reference to their operating principles. Finally, different avenues for addressing the cluster centroid selection problem with recourse to the different metaheuristic algorithms are presented.

The increasing size of the data and text on electronic sites has necessitated the use of different clustering methods, including text clustering. This may be a helpful unsupervised analysis method used for partitioning the immense size of text documents into a set of groups. The feature choice may be a well‐known unsupervised methodology accustomed to eliminating uninformative options to enhance the performance of the text clustering method. In Chapter 2, the authors have a tendency to project a rule to resolve the featured choice drawback before applying the k‐means text clustering technique by rising the exploitation searchability of the fundamental harmony search algorithmic rule known as H‐HSA. The projected feature choice methodology is used in this chapter to reinforce the text clustering technique by offering a replacement set of informative features.

In the advancement of data analytics, data clustering has become one of the most important areas in modern data science. Several works have come up with various algorithms to deal with data clustering. In Chapter 3, the objective is to improve the data clustering by using metaheuristic‐based algorithms. For this purpose, the authors have proposed a genetic algorithm–based data clustering approach. Here, a new adaptive position–based crossover technique has been proposed for the genetic algorithm where the new concept of vital gene has been proposed during the crossover. The simulation results demonstrate that the proposed method performs better compared to the other two genetic algorithm–based data clustering methods. Furthermore, it has also been observed that the proposed approach is time efficient compared to its counterparts.

A social network, used by the human population as a platform of interaction, generates a large volume of diverse data every day. These data and attributes of the interactions become more and more critical for researchers and businesses to identify societal and economic values. However, the generated data is vast, highly complex, and dynamic, which necessitates a real‐time solution. Machine learning is a useful tool in order to summarize the meaningful information from large, diverse datasets. Chapter 4 provides a survey of several applications of social network analysis where machine learning plays a critical role. These applications range from spam content detection to human behavior analysis, from topic modeling to recommender systems, and from sentiment analysis to emotion contagion in social network.

Predicting students' performance at an earlier stage is important for improving their performance for higher education and placement opportunities. Early prediction of student grades allows an instructor to detect the students' poor performance in a course automatically and also provides enormous opportunities to the decision‐makers to take remedial measures to help the students to succeed in future education. A model predicting students' grades using CART, ID3, and improved multiclass SVM optimized by the genetic algorithm (GA) is investigated in Chapter 5. The model follows a supervised learning classification by means of CART, ID3, and SVM optimzed by the GA. In this study, the model is tested on a dataset that contains undergraduate student information, i.e., total marks obtained in the courses taken up in four years with the respective labeled subject name and a code at Sikkim Manipal Institute of Technology, Sikkm, India. A comparative analysis among CART, ID3, and multiclass SVM optimized by the GA indicates that the multiclass SVM optimized by GA outperforms ID3 and CART decision tree algorithms in the case of multiclass classification.

Significant advances in information technology result in the excessive growth of data in health care informatics. In today's world, development technologies are also being made to treat new types of diseases and illnesses, but no steps are being taken to stop the disease in its track in the early stages. The motivation of Chapter 6 is to help prepare people to diagnose the disease at early stages based on the symptoms of the disease. In this chapter, the authors have used various nature‐inspired clustering algorithms in collaboration with k‐means algorithms to actively cluster a person's health data with already available data and label it accordingly. Experiments results prove that nature‐inspired algorithms like firefly with k‐means are giving efficient result to the existing problems.

With the fast development of pattern discovery–oriented systems, data mining is rapidly intensifying in other disciplines of management, biomedical, and physical sciences to tackle the issues of data collection and data storage. With the advancement of data science, numerous knowledge‐oriented paradigms are evaluated for automatic rule mining. Association rule mining is an active research area with numerous algorithms used for knowledge accumulation. Chapter 7 focuses on handling the various challenging issues of only demand‐driven aggregation of information sources, mining and analyzing relevant patterns to preserve user concerns, and implementing the same association rule mining problem for multi‐objective solutions rather than as a single‐objective solution for post‐purchase customer analysis.

The GA and fuzzy c‐means (FRCM) algorithm is widely used in magnetic resonance image segmentation. In Chapter 8, a hybrid concept, quantum‐inspired modified GA and FRCM are used to segment MR images. The modified GA (MEGA) enhances the performance of the GA by modifying population initialization and crossover probability. To speed up this classical MEGA and also to derive more optimized class levels, some quantum computing characteristics like qubit, entanglement, orthogonality, rotational gate, etc., are incorporated into the classical MEGA. The class levels created by the quantum‐inspired MEGA are employed to the FRCM as initial input to overcome the convergence problem of the FRCM. A performance comparison using some standard evaluation metrics is delineated between quantum‐inspired MEGA‐based FRCM, classical MEGA‐based FRCM, and conventional FRCM with the help of two grayscale MR images, which shows the excellence of the proposed quantum‐inspired MEGA‐based FRCM over both the classical MEGA‐based FRCM and the conventional FRCM methods.

Large volumes of data have been rapidly collected due to the increasing advances in equipment and techniques for content acquisition. However, the efficient storage, indexing, retrieval, representation, and recognition of multimedia data, such as text, audio, images, and videos, are challenging tasks. To summarize the main characteristics of datasets and simplify their interpretation, exploratory data analysis is often applied to numerous problems in several fields, such as pattern recognition, computer vision, machine learning, and data mining. A common data analysis technique, commonly associated with descriptive statistics and visual methods, is cluster analysis or clustering. In Chapter 9, the authors propose a hybrid method based on k‐means and the genetic algorithm guided by a qualitative objective function. Experiments demonstrate good results of the proposed method.

The editors hope that this book will be helpful for students and researchers who are interested in this area. It can also prove to be a novel initiative for undergraduate students of computer science, information science, and electronics engineering for part of their curriculum.

October, 2019

Sourav DeCooch Behar, India

Sandip DeyJalpaiguri, India

Siddhartha BhattacharyyaBangalore, India

1Metaheuristic Algorithms in Fuzzy Clustering

Sourav De1, Sandip Dey2, and Siddhartha Bhattacharyya3

1Department of Computer Science and Engineering, Cooch Behar Government Engineering College, India

2Department of Computer Science, Sukanta Mahavidyalaya, Jalpaiguri, India

3Department of Computer Science and Engineering, CHRIST (Deemed to be University), Bangalore, India

1.1 Introduction

Fuzzy clustering refers to the process of assigning data points to different clusters based on the similarity/dissimilarity of features. This process ensures that items in the same cluster are as similar as possible, while dissimilar items belong to different clusters. The identification of the clusters and the assignment of items to clusters are decided with the help of several similarity measures, which include measures of distance, connectivity, and intensity. The choice of the similarity measures depends on the type of data or the application [1].

Both classical and new algorithms have evolved over the years to address the clustering problem. Notable among them are the ‐means [2] and fuzzy clustering [3,4]. The classical algorithms primarily segregate the data points into completely different clusters while ensuring that the dissimilarity between the different clusters and the similarity of the constituent data points within any cluster are maximized in the process. Thus, these algorithms ensure that there is no overlap between the clusters. However, fuzzy clustering relies on the soft meaning, thereby enabling overlapping between clusters with the constituent data points belonging to more than one cluster depending on a degree of belongingness.

The main limitation in any clustering algorithm lies in the initialization process, which entails an initial selection of cluster center points, which are chosen randomly in most cases. Hence, an improper initialization of the cluster centers may lead to an unacceptable result since the positions of the cluster centers, with respect to the constituent data points, are major concerns in the assignment of the data points to the cluster centers.

1.2 Fuzzy Clustering

Fuzzy clustering, often referred to as soft clustering or soft ‐means, is a method that entails a soft distinction of the constituent data points. By contrast, in any non‐fuzzy/crisp clustering, each data point is designated to belong to exactly one and only one cluster with no overlapping of clusters. The data points, however, can belong to more than one cluster, implying that certain overlaps exist between resultant clusters. The underlying principle behind this partitional clustering technique is the concept of fuzzy soft set theory, which holds that for a given universe of discourse, every constituent element belongs to all the sets defined in the universe with a certain degree of belongingness (also referred to as membership) [3–5]. Fuzzy clustering is often treated as preferable due to the inherent advantages of having a natural affinity of incorporating larger datasets, a simple and straightforward implementation, the ability to handle large datasets as the time complexity is , the ability to produce very good results for hyper spherically shaped well‐separated clusters, being robust in design, and the ability to converge to a local optimal solution [1].

1.2.1 Fuzzy ‐means (FCM) clustering

FCM clustering is one of the most widely used. It was developed by J.C. Dunn in 1973 [6] and improved by J.C. Bezdek in 1981 [1]. The operation of the algorithm is quite similar to the widely known ‐means algorithm. The basic steps are as follows:

Select a number of clusters.

Randomly assign coefficients to each data point to label them to the clusters.

Repeat until the algorithm converges, i.e., when the change in the coefficients in two consecutive iterations is no more than a predefined threshold

.

Compute the cluster centroids for each cluster. Every data point

is identified by a set of coefficients indicating the degree of belongingness to the

cluster

. In FCM, the mean of all the participating points weighted by their degree of belongingness to the cluster represents the cluster centroid. It is mathematically given as

(1.1)

where

is a hyper‐parameter that controls the fuzzyness of the clusters. The higher

is, the fuzzier the cluster will be in the end.

For each data point, compute its coefficients of belongingness in the clusters.

1.3 Algorithm

An algorithm attempts to partition a finite collection of elements into a collection of fuzzy clusters with respect to some given criterion. Given a finite set of data, the algorithm returns a list of cluster centers and a partition matrix , where each element tells the degree to which element belongs to cluster .

The aim is to minimize an objective function in the following form:

(1.2)

where:

(1.3)

‐means clustering works along similar lines. However, it differs from the ‐means objective function by the presence of (or the cluster fuzziness) determined by the fuzzifier, , with . A large results in smaller membership values, , and, hence, fuzzier clusters. In the limit , the memberships, , converge to 0 or 1, which implies a crisp partitioning. is commonly set to 2. The algorithm also minimizes the intracluster variance that often leads to a local minimum. Moreover, the clustering results depend on the initial choice of weights. Fuzzy clustering suffers from the fact that the number of clusters in the given dataset should be known beforehand. It is also sensitive to noise and outliers.

1.3.1 Selection of Cluster Centers

Most of the clustering algorithms require an initial selection of cluster centroids (which is often made in a random fashion) without exception. In fact, the selection of the initial cluster center values is considered one of the most challenging tasks in partitional clustering algorithms. Incorrect selection of initial cluster center values leads the searching process toward an optimal solution that gets often stuck in a local optima yielding undesirable clustering results [7,8]. The primary cause behind this problem lies in the fact that the clustering algorithms run in a manner similar to the hill climbing algorithm [9], which, being a local search‐based algorithm, moves in one direction without performing a wider scan of the search space to minimize (or maximize) the objective function. This behavior prevents the algorithm to explore other regions in the search space that might have a better, or even the desired, solution. Thus, proper exploitation and exploration of the search space are not effected in the running of these algorithms.

The general approach to alleviate this problem is to rerun the algorithm several times with several cluster initializations. However, this approach is not always feasible, especially when it comes to the clustering of a large dataset or complex dataset (i.e., a dataset with multiple optima) [10]. Thus, this section mechanism may be incarnated as a global optimization problem calling for the help of optimization algorithms.

Several global‐based search algorithms have been proposed to solve this local‐search problem [11]. These algorithms include both local search‐based metaheuristic algorithms such as SA, TS, or such as EAs (including EP, ES, GAs, and DE), HS or such as PSO, ABC, and ACO. The following sections provide an overview of the algorithms proposed to solve the clustering problem where the clusters number is known or set up a priori.

1.4 Genetic Algorithm