Robustness to failures in two-layer communication networks

A close look at many existing systems reveals their two- or multi-layer nature, where a number of coexisting networks interact and depend on each other. For instance, in the Internet, any application-level graph (such as a peer-to-peer network) is mapped on the underlying IP network that, in turn, is mapped on a mesh of optical fibers. This layered view sheds new light on the tolerance to errors and attacks of many complex systems. What is observed at a single layer does not necessarily reflect well the state of the entire system. On the contrary, a tiny, seemingly harmless disruption of one layer, may destroy a substantial or essential part of another layer, thus making the whole system useless in practice. In this thesis we consider such two-layer systems. We model them by two graphs at two different layers, where the upper-layer (or logical) graph is mapped onto the lower-layer (physical) graph. Our main goals are the following. First, we study the robustness to failures of existing large-scale two-layer systems. This brings us some valuable insights into the problem, e.g., by identifying common weak points in such systems. Fortunately, these two-layer problems can often be effectively alleviated by a careful system design. Therefore, our second major goal is to propose new designs that increase the robustness of two-layer systems. This thesis is organized in three main parts, where we focus on different examples and aspects of the two-layer system. In the first part, we turn our attention to the existing large-scale two-layer systems, such as peer-to-peer networks, railway networks and the human brain. Our main goal is to study the vulnerability of these systems to random errors and targeted attacks. Our simulations show that (i) two-layer systems are much more vulnerable to errors and attacks than they appear from a single layer perspective, and (ii) attacks are much more harmful than errors, especially when the logical topology is heterogeneous. These results hold across all studied systems. A natural next step consists in improving the failure robustness of two-layer systems. In particular, in the second part of this thesis, we consider the IP/WDM optical networks, where an IP backbone network is mapped on a mesh of optical fibers. The problem lies in designing a survivable mapping, such that no single physical failure disconnects the logical topology. This is an NP-complete problem. We introduce a new concept of piecewise survivability, which makes the problem much easier in practice. This leads us to an efficient and scalable algorithm called SMART, which finds a survivable mapping much faster (often by orders of magnitude) than the other approaches proposed to date. Moreover, the formal analysis of SMART allows us to prove that a given survivable mapping does or does not exist. Finally, this approach helps us to find vulnerable areas in the system, and to effectively reinforce them, e.g., by adding new links. In the third part of this thesis, we shift our attention one layer higher, to the application-over-IP setting. In particular, we consider the design of Application-Level Multicast (ALM) for interactive applications, where a single source sends a delay-constrained data stream to a number of destinations. Interactive ALM should (i) respect stringent delay requirements, and (ii) proactively protect the system against overlay node failures and against (iii) the packet losses at the IP layer. We propose a two-layer-aware approach to this problem. First, we prove that the average packet loss rate observed at the destinations can be effectively approximated by a purely topological metric that, in turn, drops with the amount of IP-level and overlay-level path diversity available in the system. Therefore, we propose a framework that accommodates and generalizes various techniques to increase the path diversity in the system. Within this framework we optimize the structure of ALM. As a result, we reduce the effective loss rate of real Internet topologies by typically 30%-70%, compared to the state of the art. Finally, in addition to the three main parts of the thesis, we also present a set of results inspired by the study of ALM systems, but not directly related to the 'two-layer' paradigm (and thus moved to the Appendix). In particular, we consider a transmission of a delay-sensitive data stream from a single source to a single destination, where the data packets are protected by a Forward Error Correction (FEC) code and sent over multiple paths. We show that the performance of such a scheme can often be further improved. Our key observation is that the propagation times on the available paths often significantly differ, typically by 10-100ms. We propose to exploit these differences by appropriate packet scheduling, which results in a two- to five-fold improvement (reduction) in the effective loss rate.

Related material