Designing Distributed SDN C-Plane Considering Large-Scale Disruption and Restoration

Takahiro HIRAYAMA  Masahiro JIBIKI  Hiroaki HARAI  

IEICE TRANSACTIONS on Communications   Vol.E102-B   No.3   pp.452-463
Publication Date: 2019/03/01
Publicized: 2018/09/20
Online ISSN: 1745-1345
DOI: 10.1587/transcom.2018NVP0005
Type of Manuscript: Special Section PAPER (Special Section on Network Virtualization and Network Softwarization for Diverse 5G Services)
software-defined networking (SDN),  distributed SDN control,  failure recovery,  control plane,  

Full Text: FreePDF(2.1MB)

Software-defined networking (SDN) technology enables us to flexibly configure switches in a network. Previously, distributed SDN control methods have been discussed to improve their scalability and robustness. Distributed placement of controllers and backing up each other enhance robustness. However, these techniques do not include an emergency measure against large-scale failures such as network separation induced by disasters. In this study, we first propose a network partitioning method to create a robust control plane (C-Plane) against large-scale failures. In our approach, networks are partitioned into multiple sub-networks based on robust topology coefficient (RTC). RTC denotes the probability that nodes in a sub-network isolate from controllers when a large-scale failure occurs. By placing a local controller onto each sub-network, 6%-10% of larger controller-switch connections will be retained after failure as compared to other approaches. Furthermore, we discuss reactive emergency reconstruction of a distributed SDN C-plane. Each node detects a disconnection to its controller. Then, C-plane will be reconstructed by isolated switches and managed by the other substitute controller. Meanwhile, our approach reconstructs C-plane when network connectivity recovers. The main and substitute controllers detect network restoration and merge their C-planes without conflict. Simulation results reveal that our proposed method recovers C-plane logical connectivity with a probability of approximately 90% when failure occurs in 100 node networks. Furthermore, we demonstrate that the convergence time of our reconstruction mechanism is proportional to the network size.