000212934 001__ 212934
000212934 005__ 20190619023703.0
000212934 0247_ $$2doi$$a10.5075/epfl-thesis-6784
000212934 02470 $$2urn$$aurn:nbn:ch:bel-epfl-thesis6784-2
000212934 02471 $$2nebis$$a10528232
000212934 037__ $$aTHESIS
000212934 041__ $$aeng
000212934 088__ $$a6784
000212934 245__ $$aLanguage Support for Distributed Functional Programming
000212934 269__ $$a2015
000212934 260__ $$bEPFL$$c2015$$aLausanne
000212934 300__ $$a232
000212934 336__ $$aTheses
000212934 502__ $$aProf. James Richard Larus (président) ; Prof. Martin Odersky (directeur de thèse) ; Prof. Viktor Kuncak, Prof. Jan Vitek, Prof. Matei Zaharia (rapporteurs)
000212934 520__ $$aSoftware development has taken a fundamental turn. Software today has gone from simple, closed programs running on a single machine, to massively open programs, patching together user experiences byway of responses received via hundreds of network requests spanning multiple machines. At the same time, as data continues to stockpile, systems for big data analytics are on the rise. Yet despite this trend towards distributing computation, issues at the level of the language and runtime abound. Serialization is still a costly runtime affair, crashing running systems and confounding developers. Function closures are being added to APIs for big data processing for use by end-users without reliably being able to transmit them over the network. And much of the frameworks developed for handling multiple concurrent requests byway of asynchronous programming facilities rely on blocking threads, causing serious scalability issues. This thesis describes a number of extensions and libraries for the Scala programming language that aim to address these issues and to provide a more reliable foundation on which to build distributed systems. This thesis presents a new approach to serialization called pickling based on the idea of generating and composing functional pickler combinators statically. The approach shifts the burden of serialization to compile time as much as possible, enabling users to catch serialization errors at compile time rather than at runtime. Further, by virtue of serialization code being generated at compile time, our framework is shown to be significantly more performant than other state-of-the-art serialization frameworks. We also generalize our technique for generating serialization code to generic functions other than pickling. Second, in light of the trend of distributed data-parallel frameworks being designed around functional patterns where closures are transmitted across cluster nodes to large-scale persistent datasets, this thesis introduces a new closure-like abstraction and type system, called spores, that can guarantee closures to be serializable, thread-safe, or even have custom user-defined properties. Crucially, our system is based on the principle of encoding type information corresponding to captured variables in the type of a spore. We prove our type system sound, implement our approach for Scala, evaluate its practicality through a small empirical study, and show the power of these guarantees through a case analysis of real-world distributed and concurrent frameworks that this safe foundation for closures facilitates. Finally, we bring together the above building blocks, pickling and spores, to form the basis of a new programming model called function-passing. Function-passing is based on the idea of a distributed persistent data structure which stores in its nodes transformations to data rather than the distributed data itself, simplifying fault recovery by design. Lazy evaluation is also central to our model; by incorporating laziness into our design only at the point of initiating network communication, our model remains easy to reason about while remaining efficient in time and memory. We formalize our programming model in the form of a small-step operational semantics which includes a precise specification of the semantics of functional fault recovery, and we provide an open-source implementation of our model in and for Scala.
000212934 6531_ $$aDistributed programming
000212934 6531_ $$afunctional programming
000212934 6531_ $$aconcurrency
000212934 6531_ $$aparallelism
000212934 6531_ $$aparallel programming
000212934 6531_ $$aasynchronous programming
000212934 6531_ $$aScala
000212934 6531_ $$aserialization
000212934 6531_ $$afunctions
000212934 6531_ $$aclosures
000212934 700__ $$0242185$$g191683$$aMiller, Heather
000212934 720_2 $$aOdersky, Martin$$edir.$$g126003$$0241835
000212934 8564_ $$uhttps://infoscience.epfl.ch/record/212934/files/EPFL_TH6784.pdf$$zn/a$$s2568310$$yn/a
000212934 909C0 $$xU10409$$0252187$$pLAMP
000212934 909CO $$pthesis-public$$pDOI$$pIC$$ooai:infoscience.tind.io:212934$$qGLOBAL_SET$$pthesis$$pthesis-bn2018$$qDOI2
000212934 917Z8 $$x108898
000212934 917Z8 $$x108898
000212934 917Z8 $$x108898
000212934 918__ $$dEDIC2005-2015$$cIIF$$aIC
000212934 919__ $$aLAMP1
000212934 920__ $$b2015$$a2015-10-16
000212934 970__ $$a6784/THESES
000212934 973__ $$sPUBLISHED$$aEPFL
000212934 980__ $$aTHESIS