Self-organized fault-tolerant routing in peer-to-peer overlays
In sufficiently large heterogeneous overlays message loss and delays are likely to occur. This has a significant impact on overlay routing, especially on longer paths. The existing solutions to this problem rely on message redundancy to mask the loss and delays. This incurs a significant bandwidth cost. We propose the Forward Feedback Protocol (FFP) which only routes a single copy of the message and detects the message loss and excessive delays while routing. Failures are signaled along the routing paths. Based only on the simple binary signals, each overlay node locally and independently learns to route to avoid failures. The local node interactions lead to the emergence of fast reliable overlay routes. This is a continuous process, the system constantly self-organizes in response to changing delay and loss conditions. We evaluate the protocol in the Internet deployment and in simulation. Our system uses 2-5 times less bandwidth than the existing overlay routing approaches that rely on high message redundancy for fault-tolerance. Despite its marginal bandwidth investment in reliability, FFP achieves up to a 30% higher delivery success rate in comparison to the existing solutions. The protocol is scalable with local state size of O(log2 N) in terms of the network size and is universally applicable to all recursively routing overlays.