000223523 001__ 223523
000223523 005__ 20190509132603.0
000223523 0247_ $$2doi$$a10.5075/epfl-thesis-7289
000223523 02470 $$2urn$$aurn:nbn:ch:bel-epfl-thesis7289-9
000223523 02471 $$2nebis$$a10793455
000223523 037__ $$aTHESIS
000223523 041__ $$aeng
000223523 088__ $$a7289
000223523 245__ $$aReliable and Real-Time Distributed Abstractions
000223523 269__ $$a2016
000223523 260__ $$aLausanne$$bEPFL$$c2016
000223523 300__ $$a205
000223523 336__ $$aTheses
000223523 502__ $$aProf. Viktor Kuncak (président) ; Prof. Rachid Guerraoui (directeur de thèse) ; Prof. Jean-Dominique Decotignie, Prof. Fernando Pedone, Dr Yvonne-Anne Pignolet (rapporteurs)
000223523 520__ $$aThe celebrated distributed computing approach for building systems and services using multiple machines continues to expand to new domains. Computation devices nowadays have additional sensing and communication capabilities, while becoming, at the same time, cheaper, faster and more pervasive. Consequently, areas like industrial control, smart grids and sensor networks are increasingly using such devices to control and coordinate system operations. However, compared to classic distributed systems, such real-world physical systems have different needs, e.g., real-time and energy efficiency requirements. Moreover, constraints that govern communication are also different. Networks become susceptible to inevitable random losses, especially when utilizing wireless and power line communication. This thesis investigates how to build various fundamental distributed computing abstractions (services) given the limitations, the performance and the application requirements and constraints of real-world control, smart grid and sensor systems. In quest of completeness, we discuss four distributed abstractions starting from the level of network links all the way up to the application level. At the link level, we show how to build an energy-efficient reliable communication service. This is especially important for devices with battery-powered wireless adapters where recharging might be unfeasible. We establish transmission policies that can be used by processes to decide when to transmit over the network in order to avoid losses and minimize re-transmissions. These policies allow messages to be reliably transmitted with minimum transmission energy. One level higher than links is failure detection, a software abstraction that relies on communication for identifying process crashes. We prove impossibility results concerning implementing classic eventual failure detectors in networks with probabilistic losses. We define a new implementable type of failure detectors, which preserves modularity. This means that existing deterministic algorithms using eventual failure detectors can still be used to solve certain distributed problems in lossy networks: we simply replace the existing failure detector with the one we define. Using failure detectors, processes might get information about failures at different times. However, to ensure dependability, environments such as distributed control systems (DCSs), require a membership service where processes agree about failures in real time. We prove that the necessary properties of this membership cannot be implemented deterministically, given probabilistic losses. We propose an algorithm that satisfies these properties, with high probability. We show analytically, as well as experimentally (within an industrial DCS), that our technique significantly enhances the DCS dependability, compared to classic membership services, at low additional cost. Finally, we investigate a real-time shared memory abstraction, which vastly simplifies programming control applications. We study the feasibility of implementing such an abstraction within DCSs, showing the impossibility of this task using traditional algorithms that are built on top of existing software blocks like failure detectors. We propose an approach that circumvents this impossibility by attaching information to the failure detection messages, analyze the performance of our technique and showcase ways of adapting it to various application needs and workloads.
000223523 6531_ $$aDistributed abstractions
000223523 6531_ $$aprobabilistic message loss
000223523 6531_ $$atransmission policy
000223523 6531_ $$apartially observable Markov decision problem
000223523 6531_ $$areliable links
000223523 6531_ $$afailure detection
000223523 6531_ $$aconsensus
000223523 6531_ $$adistributed control system
000223523 6531_ $$amembership
000223523 6531_ $$adistributed shared memory
000223523 6531_ $$areal-time systems
000223523 700__ $$0246547$$aKozhaya, David$$g210783
000223523 720_2 $$0240335$$aGuerraoui, Rachid$$edir.$$g105326
000223523 8564_ $$s14049143$$uhttps://infoscience.epfl.ch/record/223523/files/EPFL_TH7289.pdf$$yn/a$$zn/a
000223523 909C0 $$0252114$$pDCL$$xU10407
000223523 909CO $$ooai:infoscience.tind.io:223523$$pthesis-bn2018$$pDOI$$pIC$$pthesis$$qDOI2$$qGLOBAL_SET
000223523 917Z8 $$x108898
000223523 917Z8 $$x108898
000223523 917Z8 $$x108898
000223523 918__ $$aIC$$dEDIC
000223523 919__ $$aLPD
000223523 920__ $$a2016-12-16$$b2016
000223523 970__ $$a7289/THESES
000223523 973__ $$aEPFL$$sPUBLISHED
000223523 980__ $$aTHESIS