142,99 €
Nowadays, the Internet is becoming more and more complex due to an everincreasing number of network devices, various multimedia services and a prevalence of encrypted traffic. Therefore, in this context, this book presents a novel efficient multi modular troubleshooting architecture to overcome limitations related to encrypted traffic and high time complexity. This architecture contains five main modules: data collection, anomaly detection, temporary remediation, root cause analysis and definitive remediation. In data collection, there are two sub modules: parameter measurement and traffic classification. This architecture is implemented and validated in a software-defined networking (SDN) environment.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 249
Veröffentlichungsjahr: 2023
Cover
Table of Contents
Dedication Page
Title Page
Copyright Page
Preface
Introduction
1 State of the Art on Network Troubleshooting
1.1. Network troubleshooting
1.2. Background on encryption protocols
1.3. Drawbacks of troubleshooting with encrypted traffic
1.4. Conclusion
2 Novel Global Troubleshooting Framework for Encrypted Traffic
2.1. Novel network troubleshooting architecture for encrypted traffic
2.2. Proof of concept of novel troubleshooting architecture in SDN
2.3. Data collection
2.4. Troubleshooting dataset
2.5. Conclusion
3 Traffic Classification: Novel QUIC Traffic Classifier Based on Convolutional Neural Network
3.1. Introduction
3.2. Background
3.3. Traffic classification approaches
3.4. Novel traffic classification method for QUIC traffic
3.5. Experimental results
3.6. Conclusion
4 Anomaly Detection
4.1. Introduction
4.2. Anomaly detection approaches
4.3. Anomaly detection approach using machine learning
4.4. Experimental results
4.5. Conclusion
5 Temporary Remediation: SDN-based Application-aware Segment Routing for Large-scale Networks
5.1. Introduction
5.2. Application-aware routing mechanisms
5.3. Adaptive segment routing mechanism for encrypted traffic
5.4. Experimental results
5.5. Conclusion
6 Root Cause Analysis and Definitive Remediation
6.1. Root cause analysis: machine learning based root cause analysis for SDN network
6.2. Definitive remediation: adaptive QUIC BBR algorithm using reinforcement learning for dynamic networks
Conclusions and Prospects
References
Index
Other titles from iSTE in Networks and Telecommunications
End User License Agreement
Chapter 2
Table 2.1. Comparison between NetFlow, sFlow and OpenFlow-based monitoring a...
Table 2.2. Existing troubleshooting dataset
Table 2.3. Considered network conditions
Table 2.4. Datasets for root cause analysis.
Table 2.5. Classification dataset
Chapter 3
Table 3.1. Registered port numbers by IANA for several applications
Table 3.2. Signatures for several P2P applications
Table 3.3. Dataset specification
Table 3.4. Performance metrics of ML algorithms in the first stage of classi...
Table 3.5. Time complexity of ML algorithms in the first stage of classifica...
Chapter 4
Table 4.1. Considered network conditions
Table 4.2. Anomaly detection datasets
Table 4.3. Performance metrics of ML algorithms in anomaly detection for the...
Table 4.4. Time complexity of ML algorithms in anomaly detection for the dat...
Table 4.5. Performance metrics of ML algorithms in anomaly detection for the...
Table 4.6. Time complexity of ML algorithms in anomaly detection for the dat...
Chapter 5
Table 5.1. Configuration of the PC used in the testbed
Table 5.2. Scenarios
Table 5.3. Summarization of average optimal MOS, median and 95% confidence i...
Table 5.4. Summarization of average overhead in the SR mechanisms
Chapter 6
Table 6.1. Considered network conditions
Table 6.2. Troubleshooting datasets
Table 6.3. Performance metrics of the considered ML algorithms for the datas...
Table 6.4. Time complexity of ML algorithms in RCA for the dataset in a stat...
Table 6.5. F1-score of two feature sets in the RCA for the dataset in a stat...
Table 6.6. Performance metrics of the considered ML algorithms for the datas...
Table 6.7. Time complexity of ML algorithms in RCA for the dataset in a dyna...
Table 6.8. Some important results of the considered congestion control algor...
Chapter 1
Figure 1.1. Unidirectional link discovery in LLDP. For a color version of this...
Figure 1.2. Overall traditional troubleshooting architecture
Figure 1.3. The global growth of the encrypted traffic e-Security (n.d.). For ...
Figure 1.4. Difference between TCP + TLS and QUIC architecture
Figure 1.5. Comparison of QUIC packet format with TCP + TLS. For a color versi...
Figure 1.6. Comparison of connection establishment between QUIC and TCP + TLS....
Figure 1.7. Multiplexing comparison between HTTP1.1 and HTTP/2 over TCP and QU...
Figure 1.8. IPsec packet structure. For a color version of this figure, see ww...
Figure 1.9. TLS record packet
Chapter 2
Figure 2.1. The novel troubleshooting architecture in the context of encrypted...
Figure 2.2. The novel troubleshooting framework in the SDN environment. For a ...
Figure 2.3. NetFlow architecture (Suérez-Varela and Barlet-Ros 2017). For a co...
Figure 2.4. sFlow architecture. For a color version of this figure, see www.is...
Figure 2.5. The link discovery in LLDP. For a color version of this figure, se...
Chapter 3
Figure 3.1. Byte in payload of QUIC packets for different applications. For a ...
Figure 3.2. Percentage of small and large packets in flows. For a color versio...
Figure 3.3. Novel traffic classification approach for QUIC traffic
Figure 3.4. Macro-averaging precision, macro-averaging recall and macro-averag...
Figure 3.5. Macro-averaging precision, macro-averaging recall and macro-averag...
Figure 3.6. Precision, recall and F1-score of the traffic classification metho...
Chapter 4
Figure 4.1. Overall architecture of ML-based anomaly detection mechanism in th...
Figure 4.2. The ML-based anomaly detection method.
Chapter 5
Figure 5.1. The SDN-based adaptive SR framework issued from the global trouble...
Figure 5.2. The novel traffic classification approach for encrypted traffic
Figure 5.3. The QoE estimator for encrypted traffic
Figure 5.4. The RL-based SR mechanism
Figure 5.5. Average MOS score and standard deviation of three selection algori...
Figure 5.6. The MOS score of three SR mechanisms. For a color version of this ...
Figure 5.7. The average CPU usage and overhead of three SR mechanisms. For a c...
Chapter 6
Figure 6.1. Overall architecture of ML-based RCA in SDN environment. For a col...
Figure 6.2. The ML-based RCA method
Figure 6.3. The accuracy against the number of features in the feature selecti...
Figure 6.4. Congestion control operating point: delivery rate and RTT against ...
Figure 6.5. Adaptive BBR algorithm. For a color version of this figure, see ww...
Figure 6.6. Number of network conditions in which each congestion control algo...
Figure 6.7. Average reward and standard deviation of A-BBR and benchmarks. For...
Figure 6.8. Fairness of A-BBR and benchmarks in dynamic network conditions. Fo...
Cover
Table of Contents
Dedication Page
Title Page
Copyright Page
Preface
Introduction
Begin Reading
Conclusions and Prospects
References
Index
Other titles from iSTE in Networks and Telecommunications
End User License Agreement
i
ii
iii
iv
ix
x
xi
xii
xiii
xiv
xv
xvi
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
To my sweet love Ly Ly and beloved familyVan Van Tong
To all my family, my wife, my dear children Sarah and Sinan, with all my love and infinite gratitude for your presence in my lifeSami Souihi
To my dearest family, thank you for being my constant source of love and support throughout my lifeHai-Anh Tran
To my beloved, sublime and thoughtful daughter, Kenza Insafe, on her 20th birthday this yearAbdelhamid Mellouk
New Generation Networks Set
coordinated by Abdelhamid Mellouk
Volume 3
Van Van Tong Sami Souihi Hai-Anh Tran Abdelhamid Mellouk
First published 2023 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address:
ISTE Ltd 27-37 St George’s Road London SW19 4EU UK www.iste.co.uk
John Wiley & Sons, Inc. 111 River Street Hoboken, NJ 07030 USA www.wiley.com
© ISTE Ltd 2023The rights of Van Van Tong, Sami Souihi, Hai-Anh Tran and Abdelhamid Mellouk to be identified as the authors of this work have been asserted by them in accordance with the Copyright, Designs and Patents Act 1988.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s), contributor(s) or editor(s) and do not necessarily reflect the views of ISTE Group.
Library of Congress Control Number: 2023938907
British Library Cataloguing-in-Publication Data A CIP record for this book is available from the British Library ISBN 978-1-78630-867-2
The Internet network is continuously growing in terms of size and usage. This expansion results in an increase in complexity emphasized by a prevalence of encrypted traffic. This leads to complex problems in the network, bringing many negative economic impacts for network operators. In the past, network troubleshooting solutions were extensively studied to troubleshoot network problems and deal with them completely. However, conventional solutions are unmanageable because of high time-consuming, inadaptability for encrypted traffic, scalability, overhead, automation, and efficiency.
In this book, we provide insight into network troubleshooting in the context of encrypted traffic, and present a solution to these problems, which consists of an adaptive mechanism. The objective is to detect network anomalies, diagnose their root causes, and address them definitively. It highlights the necessity of network troubleshooting and economic impacts on network operators. Additionally, it provides a state of the art on network problems (e.g. rule failure, link failure and so on) as well as existing approaches. Moreover, the book thoroughly analyzes encryption protocols before analyzing the limitations of conventional troubleshooting solutions. Furthermore, it contains a novel troubleshooting architecture in the context of encrypted traffic and presents a proof of concept of this architecture in the software-defined networking (SDN) environment. This architecture contains five main modules: data collection, anomaly detection, temporary remediation, root cause analysis and definitive remediation.
Giving a complete bibliography and a historical account of the research that led to the present form of the subject would be impossible. Thus, some topics are discussed in less detail than others. The choices made reflect, in part, personal taste and expertise and, in part, a preference for up-and-coming research and recent developments in network troubleshooting for network operators.
This book is a preliminary for network troubleshooting in the context of encrypted traffic. We hope that it will inspire other researchers and network operators on the road of network troubleshooting. The authors hope you will enjoy reading this book and get many helpful ideas and overviews for your own studies.
Van VAN TONG
Sami SOUIHI
Hai ANH TRAN
Abdelhamid MELLOUK
June 2023
“We are all now connected by the Internet, like neurons in a giant brain”
Stephen Hawking
Network troubleshooting plays an important part in the network. This is because computer networks have rapidly evolved along with the significant growth of the Internet of Things (IoT), increasing not only a network coverage, but also the complexity in the computer network, bringing a risk of incurring problems in the network. For example, there are many problems in the network, including server disruptions, cyberattacks, link failure, etc. Therefore, network troubleshooting, which is a process of detecting an anomaly, identifying its root causes and implementing remediation approaches to solve it definitively, is studied thoroughly by the research community (see Fonseca and Mota (2017); Yu et al. (2018); Cherrared et al. (2019)).
Regarding server disruptions, Table I.1 illustrates the total downtime and corresponding money lost for several service providers (Gagnaire et al. 2012). For instance, YouTube and PayPal lost from $34,000 to over $6,700,000 because of a few hours of disruption on their cloud servers due to failures.
Besides, many cloud services nowadays are disrupted by cyberattacks (e.g. distributed denial of service (DDoS) attacks, etc.). A DDoS attack is a kind of cyberattack designed to overload and disrupt network services by exhausting them with access requests. In February 2020, Amazon announced that their AWS Shield services mitigated the largest recorded DDoS attack with 2.3 Tbps of network traffic (Felter 2021). This attack, which caused three days of “elevated threat” for AWS Shield services, was carried out using hijacked Connection-less Lightweight Directory Access Protocol (CLDAP) web servers.
Table I.1.Downtime of service providers and their economic impacts
Total downtime (h)
Cost (USD)
YouTube
0.17
34,000
CloudFlare
1
168,000
Zoho
33.5
600,000
Cisco
5.33
1,066,000
eBay
6.25
1,406,250
8.5
1,700,000
PayPal
30.2
6,795,000
To deal with network problems, network troubleshooting has been extensively studied for over 30 years. However, conventional solutions are not effective because of high time consumption, inadaptability for encrypted traffic, overheads, scalability, etc.
Regarding the high time consumption, the processing time for root cause analysis and remediation can take from 1 h to more than 5 h depending on the status of anomalies in the network (Zeng et al. 2012b). As a result, network systems can suffer from negative impacts (e.g. high latency, high loss, etc). These impacts can result in frequent connection interruptions in the network. Depending on the anomaly’s nature, there are two possible cases. If the root cause of the anomaly is identified and solved quickly, temporary remediation is not necessary. Otherwise, temporary remediation is required to guarantee the availability of the network. Therefore, it became inevitable to think about network troubleshooting frameworks that guarantee the network’s availability during the root cause analysis and definitive remediation.
As for the indaptability, the traditional troubleshooting mechanisms were not designed for encrypted traffic. However, many service providers today encrypt network traffic to prevent attackers from inspecting data packets for illegal activities. Concretely, 80% of web traffic was encrypted by 2019 compared to 40% by 2016 (Cisco 2021a). From the point of view of network operators (NOs), the information in the packets is hidden, such as the sequence number, acknowledgment number, payload signatures, etc. This introduces several limitations related to network performance monitoring approaches (e.g. estimation of quality of experience, application identification, etc.) and intrusion detection systems (Kühlewind et al. 2018; Moriarty and Morton 2018). Therefore, encrypted traffic results in many obstacles for troubleshooting, particularly in data collection (e.g. collecting performance metrics, etc.) and remediation approaches using a deep packet inspection (e.g. application-aware traffic engineering, signature-based intrusion detection systems, etc.).
Concerning the overhead, carrying out data collection without influencing the network performance is a challenging task. In fact, the continuous monitoring of network data and network traffic can generate a huge overhead on the network and thus influence the network performance. Many network monitoring proposals have developed different approaches to balance data accuracy and monitoring overhead.
As for the scalability, in traditional network architectures, control logic is distributed in network devices, so updated policies in network troubleshooting are implemented separately in each network device. Nowadays, the number of network devices increases rapidly due to the rapid growth of the Internet network and IoT. This leads to the scalability issue for network troubleshooting, as well as network management.
In this book, we put a special focus on the high time consumption and inadaptability issues. Concretely, we present an analysis and comments on the network troubleshooting mechanisms before elaborating our vision of a troubleshooting framework for network operators in the context of encrypted traffic. This framework is composed of five modules: data collection, anomaly detection, temporary remediation, root cause analysis and definitive remediation. In addition to the troubleshooting framework, this book focuses on four essential points:
– in addition to the traditional data collection module, it is necessary to think about a novel traffic classification approach to classify encrypted traffic into different kinds of applications (e.g. video streaming, file transfer, etc.). In fact, the application class plays an important role in the remediation approaches (e.g. application-aware mechanisms, etc.) in network troubleshooting;
– a temporary remediation approach to assure the availability of the network as well as meet strict SLA requirements during the root cause analysis and definitive remediation;
– a proof of concept for the root cause analysis and definitive remediation in network troubleshooting that allows us to automatically identify the root cause of anomalies and address it completely;
– troubleshooting datasets: we build and contribute the troubleshooting datasets that contain a dataset for encrypted traffic classification approaches and two datasets for the root cause analysis, in order to facilitate the network troubleshooting.
In this book, each chapter is dedicated to one module of the troubleshooting framework. The remainder of the book is organized as follows:
Chapter 1: provides related work on several network problems (e.g. link failure, switch failure, etc.). In addition to explaining the fundamental parts of traditional troubleshooting architecture, we explain how network traffic is encrypted and provide an analysis on the limitations of network troubleshooting for encrypted traffic.
Chapter 2: presents fundamental parts of novel troubleshooting architecture in the context of encrypted traffic and shows a proof of concept of this architecture in a software-defined networking (SDN) environment. We also present a parameter measurement module to collect data in order to build troubleshooting datasets. Moreover, the chapter thoroughly describes the troubleshooting datasets, which are composed of datasets for root cause analysis and encrypted traffic classification.
Chapter 3: describes a novel encrypted traffic classification method to identify different kinds of applications. The purpose is to provide information about application classes for application-aware mechanisms in network troubleshooting.
Chapter 4: presents related work on anomaly detection. Moreover, this chapter takes into account congestion to generate anomalies and presents an anomaly detection approach using machine learning to detect these anomalies in the network.
Chapter 5: presents an application-aware segment routing mechanism in temporary remediation. This mechanism identifies application classes according to traffic classification. In a particular application, this mechanism implements a specific routing strategy based on a reinforcement learning algorithm to meet strict SLA requirements.
Chapter 6: considers congestion as a use-case for the root cause analysis and definite remediation. This chapter presents a root cause analysis using machine learning to identify the root cause of congestion. It also presents an adaptive congestion control algorithm to solve it completely.
Conclusions and Prospects: this last chapter concludes this book and provides an insight into the future work and prospects in the area of network troubleshooting.
“A protocol approach to troubleshooting”
Ed Wilson
Chapter 1 presents the state of the art on network troubleshooting and a traditional troubleshooting architecture for non-encrypted traffic. We then discuss its limitations when traffic is encrypted.
In the early 19th century, technicians were dispatched to find problems in telegraph and phone line infrastructure to repair and solve the issues. Historically, a troubleshooter refers to a skilled worker who finds and solves technical problems. Nowadays, troubleshooting is a form of problem-solving that aims to repair failed processes in a machine or a system. According to the related work Morris and Rouse (1985) and Jonassen and Hung (2006), there are several existing conceptions of the troubleshooting process. The basic concept of troubleshooting is finding the faulty components in a device to repair or replace it Perez (1991). Schaafstal et al. (2000) designed the troubleshooting process with four subtasks: formulating problem description, cause generation, test and evaluation. Similarly, troubleshooting is considered as an iterative process with four subprocesses: problem space construction, problem space reduction, fault diagnosis and solution verification (Johnson et al. 1993).
Network troubleshooting is an iterative process with three subtasks: identifying, diagnosing and solving problems in the network. In the past, network operators (NOs) implemented manual troubleshooting tools such as ping, traceroute, etc. ping is a computer network administration utility designed to check a reachability between a source and a destination and round-trip time of packets in the network. traceroute is a computer network diagnostic utility used to display possible routes between a source and a destination and measure a transit delay of packets in the network. These troubleshooting tools are used to diagnose complex problems such as loops caused by undefined interaction between spanning tree protocols (Heller et al. 2013), etc. However, these approaches are not effective with a huge number of network devices. Besides, 24.6% of administrators reported that anomaly diagnosis takes more than 1 h on average to solve anomalies (Zeng et al. 2012a). Therefore, it is necessary for an automated troubleshooting process that aims to detect an anomaly, locate its causes and solve it. Consequently, network troubleshooting is considered by the research community Fonseca and Mota (2017); Yu et al. (2018); Cherrared et al. (2019). In the following section, we present the state of the art of network troubleshooting.
According to the related work on network troubleshooting (Yu et al. 2018; Fonseca and Mota 2017; Van et al. 2018), problems can be classified into several categories thanks to locations where problems happen or factors that result in problems. Yu et al. (2018) and Fonseca and Mota (2017) categorize problems into problems in application, control and infrastructure layer. Similarly, problems can be classified into problems in application service providers (ASP) or Internet service providers (ISP) (Van et al. 2018). Besides, problems can be classified into problems caused by administrators (e.g. router misconfiguration, server misconfiguration, etc.) or problems that are not caused by administrators (e.g. link failure, switch failure, buffer overload, etc.). According to a survey of NOs (Zeng et al. 2012b), in this book, we present several problems that are not caused by administrators in following sections.
Bu et al. (2016) categorized failure rule in the network into missing fault and priority fault. The missing fault occurs when a rule is not executed as expected, whereas the priority fault occurs when overlapping rules violate a priority order.
There are research studies concentrating on the missing fault including ATPG in Zeng et al. (2012a) and Monocle Perešíni et al. (2015). These approaches verify the rules by generating probe packets to exercise every rule. ATPG uses a header space analysis (Kazemian et al. 2012) to check the reachability between all test hosts. Then, the reachability result is transferred to a probe packet generator to compute a minimal set of probe packets via greedy algorithm (Slavık 1997). Next, these probe packets are sent into the network systems to check the rule’s corrections. If an error is detected, a fault localization algorithm is implemented to narrow down to identify the root cause. However, ATPG has a drawback when it generates the probe packets for all rules. It is not effective when there are only a few up-to-date rules. Consequently, Monocle is proposed to overcome this drawback. This approach only verifies recently installed rules and reports misbehaviors. Besides, Monocle formulates knowledge from flow tables in the switches as constraints and applies an SAT solver (Biere 2008) to generate a set of probe packets.
Probing is an intrusive method that generates significant overheads and increases link utilization in the network. Consequently, it is necessary to minimize the number of probe packets. This is a minimum set cover problem, which is an NP-Complete problem (Zeng et al. 2012a). Therefore, Bu et al. (2016) proposed RuleScope, a framework for detecting rule failures in the network. RuleScope divides flow tables into solvable subsets of rules to minimize probe scale. Then, this approach creates a directed acyclic graph for each subset and generates a set of probe packets for each subset. As a result, this approach processes the probe packet generation more quickly due to a small scale of rule subsets.
Although RuleScope minimizes the number of probe packets, this approach suffers from a drawback related to a separation in the flow tables. This leads to the priority fault in the switches. The separation in the flow tables into small subsets can result in pretermitting two overlapping rules in two different subsets of rules. Zhao et al. (2018a) proposed SERVE, a rule verification to identify rule failure in the switches automatically. Firstly, SERVE extracts all rules for each device and builds a multi-rooted tree that considers rule connections. Next, SERVE analyzes the multi-rooted tree to generate the minimum number of probe packets. The minimum set cover problem is an NP-Complete problem, so SERVE applies the depth-first search (DFS) algorithm to generate the probe packets. Zhao et al. (2018b) extended the previous study of Zhao et al. (2018a) to present a complete framework. After generating the probe packets, SERVE injects these packets into network systems using an out-band channel. Besides, SERVE also computes a desired network behavior using the multi-rooted trees. According to a comparison between the feedback from the out-band channel for every rule and the desired network behaviors, SERVE can detect faulty rules and send notifications to administrators. SERVE’s performance is evaluated to benchmarks in processing time, number of probe packets and overheads. Concerning the number of probe packets, SERVE decreases the number of probe packets by up to 75% in comparison with Monocle. Regarding the processing time, SERVE’s figure is three times less than the figure for ATPG. As for the overhead, in-band bandwidth is not influenced according to using the out-band channel to inject the probe packets. Besides, the out-band bandwidth is far less than link capacity.
Link failure refers to unreachability between two switches. It can lead to a high packet loss and performance degradation in the network. Link failure can be detected according to probe packets in active monitoring approaches. ping is a simple troubleshooting tool that sends probe packets to check the reachability between two end-points. If probe packets are lost, it means that there is a faulty link between these end-points. Similarly, Cascone et al. (2017) proposed a fast failure detection mechanism to detect the link failure based on the exchange of bidirectional “heartbeat” packets. When the packet rate drops below a threshold, a node sends heartbeat packets to its neighbors. If there are no responses from its neighbors after a given time, the link failure happens in the network. However, this mechanism requires a strict consumption related to the backup solutions that cannot be utilized to guarantee the short failover delays (1 ms).
Moreover, this problem can be detected by using the Link Layer Discovery Protocol (LLDP) in software-defined networking (SDN) (Khan et al. 2016; Tarnaras et al. 2015). According to the topology discovery protocol, SDN controller can detect link failure and remove it from network topology. Firstly, an OpenFlow (OF) switch connects to the controller so that the controller knows its active ports. Next, the controller generates a Packet-out message to each active port in the switch to discover the topology. The LLDP between switch s1 and s2 is depicted in Figure 1.1. Firstly, the controller encapsulates an LLDP packet in a Packet-out message and sends it to the switch s1. When switch s1 receives the Packet-out message, it will forward the LLDP packet to switch s2. After receiving the LLDP packet, switch s2 encapsulates this packet in a Packet-in message and sends it back to the controller. The controller receives this message and creates a link from switch s1 to s2. The same process is performed to identify the link for an opposite direction. When link s1–s2 is faulty, the controller will not receive the Packet-in message from switch s2. Then, the controller will remove this link from the network topology. In the network with S switches interconnected by a set of L links, the total number of Packet-out and Packet-in messages are described in equations [1.1] and [1.2], respectively. Pi is the number of the active port in the switch Si.
Figure 1.1.Unidirectional link discovery in LLDP. For a color version of this figure, see www.iste.co.uk/tong/troubleshooting.zip
Unlike the SDN environments, a hybrid SDN contains OF switches and traditional switches that LLDP cannot discover. Therefore, SDN controllers