197,99 €
Intelligent data analytics for terror threat prediction is an emerging field of research at the intersection of information science and computer science, bringing with it a new era of tremendous opportunities and challenges due to plenty of easily available criminal data for further analysis. This book provides innovative insights that will help obtain interventions to undertake emerging dynamic scenarios of criminal activities. Furthermore, it presents emerging issues, challenges and management strategies in public safety and crime control development across various domains. The book will play a vital role in improvising human life to a great extent. Researchers and practitioners working in the fields of data mining, machine learning and artificial intelligence will greatly benefit from this book, which will be a good addition to the state-of-the-art approaches collected for intelligent data analytics. It will also be very beneficial for those who are new to the field and need to quickly become acquainted with the best performing methods. With this book they will be able to compare different approaches and carry forward their research in the most important areas of this field, which has a direct impact on the betterment of human life by maintaining the security of our society. No other book is currently on the market which provides such a good collection of state-of-the-art methods for intelligent data analytics-based models for terror threat prediction, as intelligent data analytics is a newly emerging field and research in data mining and machine learning is still in the early stage of development.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 496
Veröffentlichungsjahr: 2021
Cover
Title Page
Copyright
Preface
1 Rumor Detection and Tracing its Source to Prevent Cyber-Crimes on Social Media
1.1 Introduction
1.2 Social Networks
1.3 What Is Cyber-Crime?
1.4 Rumor Detection
1.5 Factors to Detect Rumor Source
1.6 Source Detection in Network
1.7 Conclusion
References
2 Internet of Things (IoT) and Machine to Machine (M2M) Communication Techniques for Cyber Crime Prediction
2.1 Introduction
2.2 Advancement of Internet
2.3 Internet of Things (IoT) and Machine to Machine (M2M) Communication
2.4 A Definition of Security Frameworks
2.5 M2M Devices and Smartphone Technology
2.6 Explicit Hazards to M2M Devices Declared by Smartphone Challenges
2.7 Security and Privacy Issues in IoT
2.8 Protection in Machine to Machine Communication
2.9 Use Cases for M2M Portability
2.10 Conclusion
References
3 Crime Predictive Model Using Big Data Analytics
3.1 Introduction
3.2 Crime Data Mining
3.3 Visual Data Analysis
3.4 Technological Analysis
3.5 Big Data Framework
3.6 Architecture for Crime Technical Model
3.7 Challenges
3.8 Conclusions
References
4 The Role of Remote Sensing and GIS in Military Strategy to Prevent Terror Attacks
4.1 Introduction
4.2 Database and Methods
4.3 Discussion and Analysis
4.4 Role of Remote Sensing and GIS
4.5 Cartographic Model
4.6 Mapping Techniques Used for Defense Purposes
4.7 Naval Operations
4.8 Future Sphere of GIS in Military Science
4.9 Terrain Evolution
4.10 Conclusion
References
5 Text Mining for Secure Cyber Space
5.1 Introduction
5.2 Literature Review
5.3 Latent Semantic Analysis
5.4 Proposed Work
5.5 Detailed Work Flow of Proposed Approach
5.6 Results and Discussion
5.7 Conclusion
References
6 Analyses on Artificial Intelligence Framework to Detect Crime Pattern
6.1 Introduction
6.2 Related Works
6.3 Proposed Clustering for Detecting Crimes
6.4 Performance Evaluation
6.5 Conclusions
References
7 A Biometric Technology-Based Framework for Tackling and Preventing Crimes
7.1 Introduction
7.2 Biometrics
7.3 Surveillance Systems (CCTV)
7.4 Legality to Surveillance and Biometrics vs. Privacy and Human Rights
7.5 Proposed Work (Biometric-Based CCTV System)
7.6 Conclusion
References
8 Rule-Based Approach for Botnet Behavior Analysis
8.1 Introduction
8.2 State-of-the-Art
8.3 Bots and Botnets
8.4 Methodology
8.5 Results and Analysis
8.6 Conclusion and Future Scope
References
9 Securing Biometric Framework with Cryptanalysis
9.1 Introduction
9.2 Basics of Biometric Systems
9.3 Biometric Variance
9.4 Performance of Biometric System
9.5 Justification of Biometric System
9.6 Assaults on a Biometric System
9.7 Biometric Cryptanalysis: The Fuzzy Vault Scheme
9.8 Conclusion & Future Work
References
10 The Role of Big Data Analysis in Increasing the Crime Prediction and Prevention Rates
10.1 Introduction: An Overview of Big Data and Cyber Crime
10.2 Techniques for the Analysis of BigData
10.3 Important Big Data Security Techniques
10.4 Conclusion
References
11 Crime Pattern Detection Using Data Mining
11.1 Introduction
11.2 Related Work
11.3 Methods and Procedures
11.4 System Analysis
11.5 Analysis Model and Architectural Design
11.6 Several Criminal Analysis Methods in Use
11.7 Conclusion and Future Work
References
12 Attacks and Security Measures in Wireless Sensor Network
12.1 Introduction
12.2 Layered Architecture of WSN
12.3 Security Threats on Different Layers in WSN
12.4 Threats Detection at Various Layers in WSN
12.5 Various Parameters for Security Data Collection in WSN
12.6 Different Security Schemes in WSN
12.7 Conclusion
References
13 Large Sensing Data Flows Using Cryptic Techniques
13.1 Introduction
13.2 Data Flow Management
13.3 Design of Big Data Stream
13.4 Utilization of Security Methods
13.5 Analysis of Security on Attack
13.6 Artificial Intelligence Techniques for Cyber Crimes
13.7 Conclusions
References
14 Cyber-Crime Prevention Methodology
14.1 Introduction
14.2 Credit Card Frauds and Skimming
14.3 Hacking Over Public WiFi or the MITM Attacks
14.4 SQLi Injection
14.5 Denial of Service Attack
14.6 Dark Web and Deep Web Technologies
14.7 Conclusion
References
Index
End User License Agreement
Chapter 1
Table 1.1 Social network users [24].
Table 1.2 Dataset features [31].
Chapter 5
Table 5.1 Similarity score of keyword ‘Authentication’ in various Document ID.
Table 5.2 Similarity score of keyword ‘SQL injection’ in various documents.
Table 5.3 Accuracy for searching cyber-attack related keywords using hybrid appr...
Chapter 6
Table 6.1 Topics in the dataset.
Table 6.2 Events present in the topics.
Table 6.3 Precision.
Table 6.4 Sensitivity.
Table 6.5 Specificity.
Table 6.6 Accuracy.
Chapter 8
Table 8.1 Features extracted from Wireshark.
Table 8.2 Rules generated.
Table 8.3 Error rate.
Chapter 9
Table 9.1 The representation schemes along with matching algorithms for Biometri...
Table 9.2 Comparisons of Biometric Identifiers on the basis of various factors [...
Table 9.3 Examples of apps using biometric recognizance [39, 40].
Table 9.4 Advantages & disadvantages of biometric system on the basis of various...
Chapter 10
Table 10.1 Four forms of knowledge discovery in crime cases.
Table 10.2 Comparison of methodology.
Chapter 12
Table 12.1 Benefits & Snag of security schemes in WSN.
Chapter 14
Table 14.1 Functionality of USB charging cable.
Chapter 1
Figure 1.1 Social networks [23].
Figure 1.2 Classification of rumor and non-rumor.
Figure 1.3 Rumor classification process.
Figure 1.4 Naïve Bayes classifier.
Figure 1.5 Hyperplane in 2-D and 3-D.
Figure 1.6 Combating misinformation in Instagram [33].
Figure 1.7 Network topology.
Figure 1.8 SI model.
Figure 1.9 SIS model.
Figure 1.10 SIR model.
Figure 1.11 SIRS model.
Figure 1.12 Centrality measures.
Figure 1.13 Rumor source detection process.
Chapter 2
Figure 2.1 Advancement of Internet through ARPANET to IoT and M2M.
Figure 2.2 Machine knowledge points of view for IoT through M2M with Cyber Secur...
Figure 2.3 IoT Theoretical Top 10 Risks.
Figure 2.4 Top 5 Functional Risks and Vulnerabilities.
Figure 2.5 GSM-based modules with wireless connectivity.
Chapter 4
Figure 4.1 Frame work of military GIS.
Figure 4.2 Various applications of GIS in defense strategy.
Figure 4.3 Cartographic model for land management in hilly area.
Figure 4.4 Digital Elevation Model.
Figure 4.5 Triangulated Irregular Network (TIN) Model.
Figure 4.6 Hillshade analysis model for terrain analysis.
Chapter 5
Figure 5.1 Broad steps followed in text mining.
Figure 5.2 Process of text mining.
Figure 5.3 Work flow of text mining.
Figure 5.4 Detailed workflow of proposed approach.
Figure 5.5 Process followed to obtain similarity score.
Figure 5.6 Similarity and accuracy for the keyword ‘Authentication’.
Figure 5.7 Ranking graph of document and similarity for keyword ‘SQL Injection’.
Figure 5.8 Accuracy for searching vulnerable keywords.
Chapter 6
Figure 6.1 Overall architecture of the proposed method.
Chapter 7
Figure 7.1 General flow of biometric systems.
Figure 7.2 Biometric traits.
Figure 7.3 Biometric framework.
Figure 7.4 Biometric applications.
Figure 7.5 Soft biometric classification.
Figure 7.6 Soft Biometric System Interface.
Figure 7.7 Surveillance system.
Figure 7.8 Accuracy recognition.
Figure 7.9 Proposed Work Flow Diagram.
Figure 7.10 Proposed Frame Work.
Figure 7.11 Intelligent Identification System.
Chapter 8
Figure 8.1 Botnet life cycle.
Figure 8.2 Different botnet detection methods.
Figure 8.3 Block diagram of proposed methodology.
Figure 8.4 Decision tree obtained using proposed approach.
Figure 8.5 Percentage accuracy of various machine learning model and proposed mo...
Chapter 9
Figure 9.1 Face.
Figure 9.2 Hand geometry.
Figure 9.3 Fingerprint.
Figure 9.4 Voice detection.
Figure 9.5 Iris.
Figure 9.6 Keystrokes.
Figure 9.7 (a), (b) Forms of linking biometric framework with cryptanalysis.
Chapter 10
Figure 10.1 Architecture of Hadoop.
Figure 10.2 Types of Homomorphic Encryption.
Chapter 12
Figure 12.1 Architecture of WSN.
Figure 12.2 Different parameters for security of information collection.
Figure 12.3 Various standards for attack detection.
Chapter 14
Figure 14.1 Cybercrime evolution.
Figure 14.2 MITM attack.
Figure 14.3 Steps of Phishing attack.
Figure 14.4 A sample email.
Figure 14.5 Session hijacking levels.
Figure 14.6 A sample session hijacking.
Figure 14.7 Steps of XSS attack.
Figure 14.8 SQL injection interpretation.
Figure 14.9 A sample DOS attack.
Figure 14.10 Differentiation between dark, deep & surface web.
Figure 14.11 Tor browser functionality.
Cover
Table of Contents
Title Page
Copyright
Preface
Begin Reading
Index
End User License Agreement
v
ii
iii
iv
xv
xvi
xvii
xviii
xix
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
321
Scrivener Publishing
100 Cummings Center, Suite 541J
Beverly, MA 01915-6106
Publishers at Scrivener
Martin Scrivener ([email protected])
Phillip Carmical ([email protected])
Edited by
Subhendu Kumar Pani
Sanjay Kumar Singh
Lalit Garg
Ram Bilas Pachori
Xiaobo Zhang
This edition first published 2021 by John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA and Scrivener Publishing LLC, 100 Cummings Center, Suite 541J, Beverly, MA 01915, USA
© 2021 Scrivener Publishing LLC
For more information about Scrivener publications please visit www.scrivenerpublishing.com
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise,except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at www.wiley.com/go/permissions.
Wiley Global Headquarters
111 River Street, Hoboken, NJ 07030, USA
For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com
Limit of Liability/Disclaimer of Warranty
While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchant-ability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials, or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read.
Library of Congress Cataloging-in-Publication Data
ISBN 978-1-119-71109-4
Cover image: www.Pixabay.Com
Cover design by Russell Richardson
Set in size of 11pt and Minion Pro by Manila Typesetting Company, Makati, Philippines
Printed in the USA
10 9 8 7 6 5 4 3 2 1
Intelligent data analytics for terror threat prediction is an emerging field of research at the intersection of information science and computer science. Intelligent data analytics for terror threat prediction is a new era that brings tremendous opportunities and challenges due to easily available criminal data for further analysis. The aim of this data analytics is to prevent threats before they happen using classical statistical issues, machine learning and artificial intelligence, rule induction methods, neural networks, fuzzy logic, and stochastic search methods on various data sources including social media, GPS devices, video feeds from street cameras and license-plate readers, travel and credit-card records and the news media, as well as government and propriety systems. Intelligent data analytics is to ensure the efficient data mining techniques to get solutions for crime investigation. Prediction of future terrorist attacks according to city, attack type, target type, claim mode, weapon type and motive of attack through classification techniques, facilitates the decision making process by security organizations, as well as to learn from the previous stored attack information and then rate the targeted sectors/areas accordingly for security measures. Intelligent data analytics models with multiple level of representation in which at each level the system learns raw to higher abstract level representation. Intelligent data analytics-based algorithms have demonstrated great performance to a variety of areas including data visualization, data pre-processing (fusion, editing, transformation, filtering, and sampling), data engineering, database mining techniques, tools and applications, etc.
This edited book, titled “Intelligent Data Analytics for Terror Threat Prediction” emerges as a consequence of the vital need for public safety in various domains and parts of the world. It is particularly targeted at resource constrained environments such as in developing nations, where crime is growing at a frightening rate across various domains of life and impeding economic growth. By source constrained situation, we mean environments where crime intelligence skilled personnel are limited and inadequate technological solutions are in place to gather operational safety information for citizens’ security. In particular, of interest is the quest to realize the nature, scope and level of impact of present crime mining solutions across various domains and to develop novel paradigms for a more comprehensive solution. This will present innovative insights that will help to obtain interventions to undertake emerging dynamic scenarios of criminal activities. Further, this book presents emerging issues, challenges and management strategies in public safety and crime control development across various domains. The book will play a vital role in improving human life to a great extent. All researchers and practitioners will highly benefit from reading this book, especially those who are working in the fields of data mining, machine learning, and artificial intelligence. This book is a good collection on the state-of-the-art approaches for intelligent data analytics. It will be very beneficial for the new researchers and practitioners working in the field to quickly know the best performing methods.
Organization of the Book
This book consists of 14 chapters. It includes quality chapters that present scientific concepts, framework and ideas on intelligent data analytics for terror threat prediction across different crime domains. The editors and expert reviewers have confirmed the high caliber of chapters through careful refereeing of the papers. For the purpose of coherence, we have organized the chapters with respect to similarity of topic addressed. The topics addressed range from crime mining issues pertaining to cyber-crime, cyber-crimes on social media, intrusion detection system, cryptography Internet of Things (IoT) and machine to machine comm. and analysis of crime scenarios.
Chapter 1, “Rumor Detection and Tracing its Source to Prevent Cyber-Crimes on Social Media” by Ravi Kishore Devarapalli, Anupam Biswas, presents the different automated rumor detection systems in social net-works and techniques to trace the source of rumor.
Chapter 2, “Internet of Things (IoT) and Machine to Machine (M2M) Communication Techniques for Cyber Crime Prediction” by Jaiprakash Narain Dwivedi presents a response to crime issues by offering a novel security structure that is based on the examination of the “limits and capacities” of M2M devices and improves the structures headway life cycle for the general IoT natural framework.
In Chapter 3, “Crime Predictive Model Using Big Data Analytics” by Hemanta Kumar Bhuyan and Subhendu Kumar Pani presents detailed information on the methods of machine learning to develop different techniques to catch criminals based on their track of activities.
Sushobhan Majumdar presents an important discussion and analysis in Chapter 4 on “The Role of Remote Sensing and GIS in Military Strategy to Prevent Terror Attacks”. He focuses on the role of RS and GIS in constructing defense strategies to prevent terror attacks.
In Chapter 5, Supriya Raheja and Geetika Munjal present “Text Mining for Secure Cyber Space”. The chapter presents an expert system for extracting similarity score of cyber-attack related keywords among various resources. The proposed work uses text mining approach for making a secure cyber space.
R. Arshath Raja, N. Yuvaraj, and N.V. Kousik provide an insightful discussion and analysis in Chapter 6 on “Analyses on Artificial Intelligence Framework to Detect the Crime Pattern”. The chapter describes the performance of the proposed clustering model for crime pattern investigation and is compared with time series analysis, support vector machine, artificial neural network. The analysis is carried out against various performance metrics that includes: accuracy, specificity, sensitivity and f-measure.
In Chapter 7, Ebrahim A.M. Alrahawe, Vikas T. Humbe, and G.N. Shinde present the issue of “Biometric Technology-Based Framework for Tackling and Preventing Crimes”. The chapter provides an insight into the possibility of integrating surveillance systems with biometric systems at a single system in order to predict crime by identifying criminals and crime tools.
In Chapter 8, Supriya Raheja, Geetika Munjal, Jyoti Jangra, and Rakesh Garg provide a useful discussion “Rule-Based Approach for Botnet Behavior Analysis”. The chapter also proposes that botnet traffic in any network is a matter of serious concern. They are used for many activities of malicious type like distributed denial of service (DDOS) attacks, mass spam, phishing attacks, click frauds, stealing the user’s confidential infor-mation like passwords and other types of cyber-crimes.
Abhishek Goel, Siddharth Gautam, Nitin Tyagi, Nikhil Sharma, and Martin Sagayam present an important discussion and analysis in Chapter 9 on the role of “Securing Biometric Framework with Cryptanalysis”. The chapter investigates the different contentions for and against biometrics and argues that while biometrics may present real protection concerns, these issues can be satisfactorily ameliorated.
In Chapter 10, Galal A. Al-Rummana, Abdulrazzaq H.A. Al-Ahdal and G. N. Shinde present “The Role of Big Data Analysis in Increasing the Crime Prediction and Prevention Rates”. The chapter discusses different big data analysis techniques.
Dipalika Das and Maya Nayak present an important discussion and analysis in Chapter 11 on the “Crime Pattern Detection Using Data Mining”. The chapter discusses how statistical data related to crime are monitored and analysed by various investigating bodies so that various strategies can be planned to prevent crimes from happening.
In Chapter 12, Nikhil Sharma, Ila Kaushik, Vikash Kumar Agarwal, Bharat Bhushan, and Aditya Khamparia present the role of “Attacks & Security Measures in Wireless Sensor Network”. The chapter presents different layer attacks along with security mechanisms to avoid the effect of attack in the network. Security is considered as one of the main constraints in any type of network so it becomes very important to take into consideration the key elements of security which are availability, integrity and confidentiality.
Hemanta Kumar Bhuyan presents an important discussion and analysis in Chapter 13 on the role of “Large Sensing Data Flows Using Cryptic Techniques”. The chapter discusses the replicated crimes using cyberspace by criminals.
In Chapter 14, Chandra Sekhar Biswal and Subhendu Kumar Pani present the role of “Cyber-Crime Methodology and its Prevention Techniques”. The chapter places the emphasis on various frauds and cyber-crime happening in India, as well as the different types of cyber-crimes along with the probable solutions for that.
Dr. Subhendu Kumar Pani
Department of Computer Science & Engineering, Orissa Engineering College, BPUT, Odisha, India
Dr. Sanjay Kumar Singh
Department of Computer Science and Engineering, Indian Institute of Technology Campus, BHU, Varanasi, Indore, India
Dr. Lalit Garg
Department of Computer Information Systems at University of Malta, Msida
Dr. Ram Bilas Pachori
Department of Electrical Engineering, Indian Institute of Technology Campus, Indore, India
Dr. Xiaobo Zhang
School of Automation, Guangdong University of Technology, China
November 2020
Ravi Kishore Devarapalli* and Anupam Biswas†
Department of Computer Science Engineering, National Institute of Technology Silchar, Assam, India
Abstract
Social media like Facebook, Twitter, WhatsApp, Sina Wiebo, Hike, etc. play an important role in the information spreading in today’s world. Due to large scale connectivity, some of the cyber criminals are choosing these platforms to implement their criminal activities such as rumor spreading which is popularly known as rumor diffusion. In a pluralistic society like India, rumors that spread over various social networking platforms are much more vulnerable. It is challenging for even public, government and technical experts working on social media to find these cyber-crimes and its origin to punish the culprit. Nowadays, some cyber criminals are choosing different platforms and paths to accomplish their plans to escape brilliantly from those activities. Thus automated detection of rumor and tracing its source has great importance.
This chapter surveys the different automated rumor detection systems in social networks and techniques to trace the source of rumor. Detection of rumors on social networking platforms is possible through analysis of shared posts and comments wrote by followers. After detection of rumors, next job is to 1) prevent the rumors from further spreading and 2) identification of culprit i.e. the originator of rumor. This chapter covers both these aspects. The opinion of an influential person in the group influences others very easily. Cyber criminals may use separate communities in social media to fulfil their activities. Thus, it is important to trace the most influential person in community to prevent further spreading. This chapter aims to discuss the recent techniques that are developed for identifying influential persons in the group. The chapter also aims to study the various techniques developed for identifying the culprit, which are based on factors like network structure, diffusion models, centrality measures, etc. The chapter will also discuss the various challenges including real-life implementation, evaluation, and datasets in respect to both rumor detection and rumor source tracing.
Keywords: Rumor, rumor source detection, centrality measures, social networks, diffusion models
Social media like Facebook, Twitter, Sina Wiebo, YouTube and WhatsApp are becoming major online businesses. Social media uses computer-based technology with internet to ease the sharing of ideas, thoughts, and information by building of virtual networks and communities [1]. Nowadays, social media plays an important role for information diffusion [2]. Social networking sites like Facebook, Twitter, and WhatsApp have become popular over a short period of time with their user-friendly features [3]. It has advantages as well as disadvantages too [4]. Advantages of Information diffusion are, enable users to upload and share photos, videos, comment and like them, without hurting any communities, religions, or political parties, etc. The other side where people may share abusive photos, videos, terror activities or sensitive information about country, which are criminal activities come under cyber-crime.
Cyber-crime is a criminal activity which involves a computer and internet as part of it [5]. There are many activities that fall under cyber-crime such as passing rumors, online harassment, hacking emails, websites or databases, etc. [6]. To detect whether given data is rumor or fact, use text classification algorithms like Naïve Bayes theorem and support vector machine [7]. These classical algorithms are mainly used for classification purpose, and classify text data based on features and dimensions [8].
Detection of these cyber-crimes and finding the culprit involved becomes challenging for public, government and police departments as it is very difficult to detect and even hard to prove. As technology progresses, people can remotely access networked computer devices from different locations. Detection of Rumor Source in social media is also difficult as people can use various devices, IP addresses and emails to bully online, such as posting offensive images, videos or any rumors about others. So detection of these cyber-crimes and origin of those as early as possible in social networks will be helpful to combat the further diffusion and also punish the culprit involved [9]. After detection of rumors in social network it is required to combat them by finding people who spread first and punish them. Rumor source identification in social networks become very difficult as discussed earlier many techniques have been introduced, but very few become popular in finding origin of rumor.
Before going to find rumor source, there is a need of considering factors such as diffusion models, network structure, evaluation metrics and centrality measures [10]. Topology associates with Network structure is in tree or graph, and observation of network example complete observation, snapshot observation, and monitor observation, etc. [11]. If network topology is tree or graph use maximum likelihood (ML) estimator to estimate source [9]. Next factor consider diffusion models to get how fast information diffusing over network [10]. There are four diffusion models namely Susceptible-Infected-Susceptible (SIS), Susceptible-Infected (SI), Susceptible-Infected-Recoverable-Susceptible (SIRS) and Susceptible-Infected-Recoverable (SIR). SI model considers each node in any one of two states susceptible or infected [12]. SIS model considers three states susceptible-infected-susceptible, where infected node can again susceptible in future [13, 14]. SIR model also has three states but susceptible-infected-recovered, where infected node can be recovered by having immune power or taking medicines [15]. SIRS has four states of nodes such as susceptible-infected-recovered-susceptible, where recovered node may be susceptible in future [16]. Another factor consider in source detection is centrality measures [17]. There are several centrality measures such as degree centrality, closeness centrality, and betweenness centrality are popularly considered. All these factors considered to detect source in network are explained in following section.
Rumor source detection approaches are broadly divided into two main categories: 1) single source detection approach and 2) multiple sources detection approach [10]. Single source detection approaches are query-based, anti-rumor-based, network observation, etc. Multiple source detection approaches are network partitioning, ranking-based, community-based, approximation-based, etc. Single source detection using network observation again has three types of observations such as snapshot, complete and monitor observations, etc. Query-based observation allows to find source of rumor by asking queries to neighbors about rumors [18], anti-rumor-based is by diffusing anti-rumors into the network can get information about network using monitor-based observation and use this information to find rumor source [19]. Multiple sources of rumor is also a challenging task, and can be done by network partitioning using rumor centrality metric [20], community-based which follows SIR model and reverse diffusion [21]. All these models are useful to find rumor source in online social networks. Rumor source identification and punishing the culprit reduces further diffusion of rumors and cyber-crimes in social networks. This survey explains in the rest of the chapter social networks and their features in Section 1.2. What is cyber-crime, various cyber-crimes and their impacts, cyber-crimes in social networks are discussed in Section 1.3. Rumor detection using classification models in Section 1.4. Factors consider in rumor source identification and its classifications are discussed in Section 1.5. Rumor source detection categories such as single source and multiple sources of rumor in network are discussed in Section 1.6. Summary of this survey is discussed in Section 1.7.
A social network is a website that allows people to make social interactions and personal relationship through sharing information like photos, videos, messages and comments, etc. [22].
There are many social networking sites, among these some sites like Facebook, WhatsApp, Twitter and Instagram that have become popular platforms over a short period for social interaction across the world [3]. But the options available in these networking sites are limited even though they are popular. There are other social networks, which are having more number of options but not popular. All these social networks are classified based on the options like sharing photos, videos, thoughts of personal or professional which are available for people to interact and collaborate each other. The following categories explain about how each one classified based on available options. Figure 1.1 depicts various social networks available online.
Figure 1.1 Social networks [23].
The major benefit of social networks is keeping in touch with family members and friends. The following list shows the most widely used social networks for building social connections online. Table 1.1 shows users for each network in millions.
Table 1.1 Social network users [24].
Service
Active users (in millions)
2,320
YouTube
1,900
1,600
Facebook Messenger
1,300
1,098
1,000
807
Qzone
532
TikTok
500
Sina Wiebo
462
330
330
Douban
320
303
Biadu Tieba
300
Skype
300
Snapchat
287
Viber
260
250
Discord
250
LINE
203
Telegram
200
Facebook: Facebook is arguably the most popular social media service, providing users with a way to build relationships and share information with people and organizations that they choose to communicate with online [3].
WhatsApp: It is a real-time social network that arrived later than Facebook but has evolved in a short time by offering user-friendly features such as quick messaging, sharing photos, videos, documents, voice and video calling, group chatting and protection for all these, etc. [3].
Twitter: Share your thoughts and keep ahead of others through this platform of real-time information [3].
YouTube: YouTube is the world’s largest social networking site for video sharing which allows users to upload and share videos, view them, comment on them, and like them. This social network is available throughout the world and even enables users to create a YouTube channel where they can share all of their personal videos to show their friends and followers [3].
Google +: The relatively new entrant to the social interaction marketplace is designed to allow users to establish communication circles with which they can communicate and which are integrated with other Google products [3].
MySpace: Though it initially began as a general social media site, MySpace has evolved to focus on social entertainment, providing a venue for social connections related to movies, music games and more [3].
Snapchat: This is social image messaging platform which allows you to chat with friends using pictures [3].
Interface through social network sites between online users is one of the major common computer-based activities. Facebook is most a dominant platform over a short span of time for social interaction across the world. Even though it stays the largest social networking site, sites like YouTube, Twitter, Google+ and Sina Wiebo all have lively and busy user populations that maintain social interactions in their own ways, with new features springing up all the time. With the help of inventive research methods and theoretical case studies, inventors can better realize how these networks work and measures their influence on various kinds of social interactions as well as the associated risks and benefits. Like sharing information which can be useful to public without hurtling other communities, religions, or political parties, etc. and the other is where people may share abusive photos, videos or sensitive information about country. For more advantages and disadvantages see Ref. [4].
Cyber-crime is also called as computer-oriented crime, as this crime involves computer and network. It can be defined as “Offences that are committed against individuals or groups of individuals with a criminal motive to intentionally harm the reputation of the victim or cause physical or mental harm, or loss, to the victim directly or indirectly, using modern telecommunication networks such as Internet (networks including chat rooms, emails, notice boards and groups) and mobile phones (Bluetooth/SMS/MMS)” [25].
Nowadays, there are many ways that cyber-crimes may occur using computers and network [5]. Some of the cyber-crimes such as hacking, cyber bullying, buying illegal things and posting videos of criminal activity are explained in following subsections.
Hacking is nothing but attacking/accessing devices like computers and mobiles without the permission of owner. It can be done by people called as hackers, who attack/access our systems without permission. Hackers are basically computer programmers who hack by intruding programmes called as virus into our systems. These viruses may steal our data like username, password, documents, files, etc. stored in computer [6]. Hacking is not limited to individual computers but can even damage computers which are in network, as well as username, password, information of emails, bank accounts, social networking sites, etc. So there is a need to concentrate on these attacks to prevent them and punish the culprit.
Cyber bullying is bullying that happens over digital devices like cell phones, computers, and tablets [26]. It can occur by sending emails, messages through social networks like Facebook, Twitter, Instagram, WhatsApp and software services like Gmail, Yahoo, etc. where a group of people are connected and sharing social information. Some people are misusing these facilities and bullies by sharing abused photos, videos, or negative information about others, etc. [27]. It is very harmful to the society and needs attention to combat and find culprit.
Buying illegal things like Bit coins and drugs belong under criminal activities in India and many other countries. So this is also a one of the cyber-crime that needs attention to prevent and find the source of user who bought those [6].
Posting/sharing videos of criminal activities such as abused photos, videos, and disinformation on online social networks is also under cyber-crime [5]. There are some other activities like sharing security issues of country and posts against some religions and communities also falls under this cyber-crime.
These are some popular cyber-crimes but are not limited to many others such as Denial-of-Service attack, Email bombing, spamming, Cyber stalking, etc. [6]. So there is a need to focus on these cyber-crimes and how to prevent and punish the culprits.
Nowadays, social networks like Facebook, Twitter, WhatsApp, Sina Wiebo, etc. are become very popular in sharing and diffusing any kind of digital information. But sharing all kinds of information is not legal in any social networks. If anyone is sharing illegal information in social networks then it is considered as cyber-crime. Some of the cyber-crimes on social networks are listed in the section below.
A. Posting Rumors
Posting/sharing any misinformation (unknown) or disinformation (wontedly wrong) is called as rumors. The diffusion of rumors is very fast comparing to the actual news in social networks [5]. Finding whether particular post content is rumor or not is a challenging task, and if it is a rumor, detecting the source of rumor is also becomes a big challenge to many people like government, police and experts who are working on social networks. Therefore, it is important to put more effort on how to control rumors in social networks and also detection of rumor source to punish them.
B. Sharing Abusive Photos/Videos
Sharing any kind of abusive photo or video of anyone is illegal activity and treated as cyber-crime [6].
C. Posting Comments Against Religions, Communities or Country
D. Movie Release Online Without Permission.
These are other occurring cyber-crimes in social networks but are not limited to these four important cyber-crimes. Whatever it may be, there is a need to prevent them and if anyone has done these, a need to find and punish them. In the next section, a discussion is given about how to detect whether a given data is rumor or fact-based on some classification algorithms.
Rumor is a currently trending topic which contains an unverified content. This content may be either wrong information (misinformation) or intentionally wrong information (disinformation) [31]. Social media is capable of diffusing the information rapidly in the network as these rumors are also disseminated over the network. Some people may not know the difference between rumor and fact and may share the same rumor to other communities.
It is observed that rumors (about politics, religions, communities, etc.) diffuse very fast when compared to normal news. Thus, it is important to stop diffusion of rumors in social media, which requires detecting whether the information is rumor or not. Nowadays, rumor detection becomes a challenging task and many researchers are working on it. How a information can be classified rumor or non-rumor is shown in Figure 1.2. The figure shows how a rumor is classified as true or false and if it is false how it is classified as misinformation or disinformation.
In order to classify whether given data is rumor or not, follow the procedure as shown in Figure 1.3.
Initially, we consider a rumor dataset (messages) from social network. Next, to process the data, data processing is used. After processing, it is required to extract features like user features, Tweet features, and comment features from processed Twitter data as shown in Table 1.2. Later, use any classification algorithm to classify rumors based on these features. Classification models classify and produce results. In order to detect whether a given text is a rumor or not, the most common approach is to simply tokenize the text and apply classification algorithms. There are many classification algorithms that exist, but only few algorithms give better results. They are algorithms like Naïve Bayes, SVM, Neural network with TF, Neural network with Keras, decision tree, random forest, Long Short Term Memory, etc. In this section two major classification algorithms are discussed.
Figure 1.2 Classification of rumor and non-rumor.
Figure 1.3 Rumor classification process.
In machine learning, Naïve Bayes classification algorithm is a very simple algorithm which is based on a combination of Bayes theorem and naïve assumption. A Naïve Bayes classifier assumes that presence of one feature is unrelated to the other features presented in same class [30]. Generally the assuptions made by Naïve Bayes are not correct in real situations, and even independence assumption never correct, but it works well in practice [29].
Table 1.2 Dataset features [31].
User features in social networks
Tweet features in Twitter
Comments features in social networks
No of followers
No of records
No of replies
No of friends
No of words
No of words
User has location in his profile
No of characters
No of characters
User has URL?
Tweet contains URL?
Comments contain URL?
User is a verified user?
Source of tweet
Source of comment
Ratio of friends/followers
Length of tweet
Length of tweet
Age of the user account
No of hash tags
No of question mark
Ratio of statuses/followers
No of mentions
No of pronouns
No of pronouns
No of URLs
No of URLs
No of exclamation mark
No of question mark
Polarity
No of exclamation mark
Presence of colon symbol
Polarity
Presence of colon symbol
It can be done using the following Bayes theorem,
Where
P(c/x) is the posterior probability of class.
P(c) is the prior probability of class.
P(x/c) is the likelihood which is the probability of given class.
P(x) is the prior probability of predictor.
Naive Bayes classifier is a combination of Bayes theorem and Naïve assumptions. This algorithm calculates assumption values even though use multiple parameters as input. Rumor detection is purely based on either classification of text or images. For example, try rumor detection in social networks like Twitter or Facebook, then it is required to consider several features like User features, Tweet features, and Comment features. All these features deal with text data [32]. If Tweet or post or comment includes these features then one can apply Naïve Bayes classifier algorithm to classify them whether it is a rumor or not. These features are classified into three categories. Some of dataset features are listed in Table 1.2.
First, consider user features, number of followers or friends to a particular person are more, then it may be considered as truth, otherwise it is a rumor. Because, in survey it is observed that many people who share rumors may have less number of followers or friends in their social networking accounts. Second, there are many features to be considered as Tweet features from which one can detect whether it is rumor or not. For example, consider number of retweets, number of words or number of characters. If count of any one of these or all of these are more than average range in size, then the tweet may be rumor, otherwise it is truth. Third one is comment features. These are very much important features used in rumor detection. This feature is based on comments given by many people who are already infected by the particular post or tweet. If found comments like Is it real? Impossible? How it is possible? Or I can’t believe this, then the particular post/tweet may be a rumor. There have many other features to distinguish whether a post/tweet is rumor or not. Figure 1.4 below gives a brief idea about how Naïve Bayes algorithm classifies different classes of data points.
Figure 1.4 Naïve Bayes classifier.
It can be observed that there are two classes of data points and how they are classified with maximum distance.
Two classes are
i. Circle
ii. Triangle.
Adding more parameters in input dataset reduces the accuracy when compared to using less parameter. To increase the accuracy use another popular model SVM.
Support vector machine (SVM) is a one of the best machine learning algorithm used for both classification and regression, widely it is used for classify given data points even though those input vectors are mapped non-linearly [8]. In social networks data available in many forms so to detect rumors it is required to classify given text data using classification algorithms based on dataset features. Classifying dataset which has multiple features and multiple dimensions is a challenging task, so using SVM will give better results.
The main objective of the SVM is to find a hyperplane in an N-dimensional space that distinctly classifies the data points. Rumor detection in social networks is mainly depending on text classification, using SVM algorithm it can be done. It is shown in Figure 1.5, how a SVM classifier classifies the given dataset that has multiple features and dimensions. SVM classifies as large margin in between two types of data: first one is in circle shape and the second one is in triangle shape. These two data points have been classified with maximum distance (thick line) between them. The large margin shown in Figure 1.5(a) says that it is classifying those circles and triangles equally from that point, which means distance between those two data types is maximum through that margin. As shown in Figure 1.5(b), SVM also supports multi-dimensional data.
Figure 1.5 Hyperplane in 2-D and 3-D.
SVM algorithm looks to maximize the margin between the data points and the hyper plane. The loss function that helps maximize the margin is hinge loss [8] and is defined as follows:
If predicted value and expected value have the same sign then the cost function is 0.
Classification of shared contents by users in social media is prevalent in combating misinformation. Baseline classification algorithms like Naïve Bayes theorem and SVM models have been used extensively for detecting rumor as discussed Section 1.4. Even though these algorithms classify rumors and facts in some manner, still there is a need to come up with some excellent techniques which may improve efficiency in rumor classification. Nowadays, social networks like Facebook, WhatsApp, Instagram and Twitter are using good techniques, but still they failed to classify the rumors exactly.
One of the popular social network, Facebook, has started in Instagram application (in US) to detect whether given post contains fact-information or false-information through some third party called as fact-checkers [33]. These third-party-fact-checkers are located globally and find rate of fact and false about particular post. When something is wrong in any post immediately fact-checkers check ratio of fact or misinformation.
If any post contains more false ratio then immediately it labels as “False information” otherwise no. Now it is the user’s responsibility to view or not that particular post based on false ratio and fact ratio, about share to their friends, communities or not. Using third-party-fact-checkers, Instagram is trying to combat misinformation on social networks. Figure 1.6 will give you brief idea about this method.
Figure 1.6 Combating misinformation in Instagram [33].
Rumor detection is not only a solution to prevent these cyber-crimes in social media, but finding source plays an important role to prevent further diffusion and punish the culprit. Initially, finding source of rumors in network discussed by Ref. [9]. Later, much research has been done and has introduced several factors which are to be considered in RS identification. There are mainly four factors considered namely, diffusion models, network structure, evaluation metrics, and centrality measures. Each factor has been explained in the following section with examples. After rumor detection, consider these factors and find rumor source using source detection methods in social networks are explained in Section 1.5.2.
Network structure can be derived from two parameters: network topology and network observation [9]. Network topology describes the structure of network either in tree or graph. Source identification is more complex in the graph topology than tree topology, as tree has exactly one root node and no loops are allowed, Graph doesn’t have any root node and loops are allowed in network. Network observation is the second type of network structure and it is useful to observe the network during rumor propagation to get the knowledge about states of nodes in particular time. Network can be observed possibly in following three ways [11]: Complete, Snapshot and Monitor.
In computer networks, network topology is defined as design of physical and logical network. Physical design is the actual design of the computer cables and other network devices. The logical design is the way in which the network appears to the devices that use it.
In complex networks, network topology is the arrangement of network in generic graph or tree. In general, many domains like medical, security, pipeline of water, gas, and power grid are available in graph structure. These graphs are required to restructure two topologies as d-regular trees and random geometric trees [34]. Initially, rumor source identification is discussed and introduces methods for general trees and general graphs based on rumor source estimator. Rumor source estimator plays a key role in finding the exact source of rumor. Source estimator mainly based on Maximum likelihood (ML) estimation is the same as a combinatorial problem [9, 35]. The following section will explain required techniques such as rumor source estimator, ML estimator, rumor centrality, and message passing algorithms to detect rumor source in trees.
In rumor source identification, network structure plays an important role. When structure of network is known, it is easy to find how a rumor is spread in network using diffusion models such as SI, SIS, SIR and SIRS. If back track these diffusion models then rumor source can be detected easily. To know the structure of network another model is used called network observation, which provides information about states of each node present in network at particular time. Those states are in a susceptible node—able to being infected, infected node—that can widen the rumor more while recovered node—that is alleviates and no longer infected [10]. If information of each node likely is susceptible, infected or recovered is observed then it is easy to generate structure of network from that knowledge. Network observation can be done in three ways: complete observation, snapshot observation and monitor observation.
Complete inspection of network presents broad information like whether a node is susceptible, infected or recovered at each time of interval in network [11]. It is not enough to know about state of node at one time only and requires multiple time intervals. Complete observation will give this knowledge even in different time intervals. Complete observation of small scale network is easy as size of network is small but it is hardly possible in large scale networks. Figure 1.7 depicts knowledge about this problem, as shown in Figure 1.7(a) regular tree with 7 nodes considered as small scale network and complete observation of network can be possible like root node, leaf nodes, degree of nodes, etc. In Figure 1.7(b) a generic graph is shown with many nodes and multiple connections between each node treated as large scale network and observation is not easy as finding the root node, leaf nodes and degree of nodes are difficult in these kind of large scale networks. It is observed that complete observation gives better results to provide knowledge about states of nodes but works only in small scale networks [39] not in large scale network. To overcome this problem another model is used called as snapshot observation.
It provides limited information about states of nodes in network at given time interval. To avoid this problem, instead of one or two snapshots, taking multiple snapshots will give better knowledge about nodes in different time intervals. Disadvantages of taking multiple snapshots may consume much time, and although it provides correct information about lone contaminated nodes, it cannot distinguish between recovered or susceptible [37]. So it is difficult to understand about nodes in these states i.e. either they received rumor and ignored it or not received yet.
Figure 1.7 Network topology.
Monitor observation means monitoring the network by inserting monitor or sensor nodes in it which works as an observer in network [36]. These sensor nodes gather information about states of nodes and pass this to administrator. The administrator will maintain all gathered data about each node state in a database. But there is chance of missing information in monitor observation as sensor nodes are inserted in a few places of network. Also, there may be a loss of information about some nodes where sensor nodes are not available. Due to unavailability of information of some nodes in network it reduces the accuracy of system, as system is based on number of nodes. If number of nodes increases then accuracy may increase but reduces performance of system due heavy load on network.
These are three types of network observations which help to understand states of nodes and network structure. Network topology and network observation both are used to understand the structure of network. Network structure is one of the best factors that are considered in source identification. Other factors also considered are diffusion model which is mandatory in source identification as discussed in Section 1.5.2.
Diffusion models are also one of the factors considered in source identification as they give information about how fast information diffusion occurs in network [2]. There are four diffusion models namely susceptibleinfected (SI), susceptible-infected-susceptible (SIS), susceptible-infectedrecovered (SIR), and susceptible-infected-recovered-susceptible (SIRS). All these come under epidemic models, which can spread deceases widely from person to other or group of people. These epidemic models are discussed in the following section as well as how they spread and the differences between them.
SI model is one of the oldest epidemic models where S stands for susceptible and I for infected. Initially, for complex networks SI model was proposed by Ref. [12]. If complex networks use SI model then state of nodes is either susceptible or infected. Once a node is infected it could remain in same state throughout life as shown in Figure 1.8. But this model is not practical. There is little chance that a susceptible infected node can be recovered and again in future. In social networks once rumor is received by any user, he/she believes it at that particular time and in the future they may know the truth and recover from it, which is not possible in SI model. The models SIS, SIR and SIRS deal with this issue and these models are discussed in the succeeding sections.
Figure 1.8 SI model.
The SI model is not practically applicable, as it doesn’t allow infected users to be recovered. The SIS model addresses this problem [13, 14], and focuses on number of persons infected and number of persons cured as well. Once anyone is infected they may be cured and become susceptible in the future. Figure 1.9 explains the same problem where susceptibility of infection is possible [38]. In social networks once a rumor is received by a user he/she may believe or ignore as they knew fact and can become susceptible in the future.
SIR model is one of the simplest diffusion models. It has three states where S stands for number of susceptible, I for number of infectious, and R for number of recovered or removed. Total number of people is considered collectively from these three states susceptible, infected, and recovered [15].
Figure 1.9 SIS model.
Figure 1.10 SIR model.
In social networks, once rumor is diffused and received by any user he/she becomes infectious if doesn’t know truth about rumor. If they knew truth, he/she recover by ignoring rumor or not passing to neighbors. This is ignored in SI and SIS models. Recovery from rumors is only between SIR and SIS models. Figure 1.10 shows how users are transforming from one state to other.
In SIR model once a person recovered from disease he/she remains in same state in future. In general once a person is cured from any disease there is chance that they may be reinfected with same decease in future, which is ignored in SIR model. SIRS model addresses this problem where once a person is infected and have recovered by having immunity or medical treatment, they couldn’t be in same recovered state in future. After recovery, there is possibility that again infected by same decease [16].
In social networks, once a rumor is diffused and received by any user, if believed, user is infected; otherwise if fact is known about rumor, user recovers by ignoring or not passing to neighbors. There is possibility that this recovered node again will be reinfected in the future on social networks. For further details see Figure 1.11.
All these diffusion models are explained in Ref. [41]. There are independent cascade models to find rumor sources by analyzing network diffusion in reverse direction [42].
Figure 1.11 SIRS model.
In rumor source identification centrality measures are also considered as one of the important factors. Centrality measures are computed to assign a score to each node, which influences the diffusion process [43]. There are several centrality measures discussed in Ref. [17], such as Degree centrality, Closeness centrality, and Betweenness centrality and are explained in following sections.
It is defined as the total nodes connected to a node in network or graph. Currently, eminent people like politicians, actors, and sports players are having excellent degrees of centrality with others in the network [40]. Social networks like Facebook and Twitter are proving this as many famous people having more number of friends and followers in their accounts. Figure 1.12(a) illustrates this degree of centrality measures.
It is defined as smallest distance among a node and other nodes in the graph [46]. See Figure 1.12(b) for more details, where closeness centrality is shown as node with black color and having the same distance with all other nodes in graph.
Figure 1.12 Centrality measures.
It is defined as a node i.e. bridge between any other two nodes and has the shortest path between them among it. It is observed that a node with better betweenness centrality may not have better degree which is necessary in information diffusion [47]. Figure 1.12(c) depicts how a betweenness centrality chosen, node in black color acts as a bridge between others. For more details about rumor centrality measures see Refs. [44, 45].
