Distributed Programming via Safe Closure Passing
Programming systems incorporating aspects of functional programming, e.g., higher-order func- tions, are becoming increasingly popular for large-scale distributed programming. New frameworks such as Apache Spark leverage functional techniques to provide high-level, declarative APIs for in- memory data analytics, often outperforming traditional “big data” frameworks like Hadoop MapRe- duce. However, widely-used programming models remain rather ad-hoc; aspects such as implemen- tation trade-offs, static typing, and semantics are not yet well-understood. We present a new asyn- chronous programming model that has at its core several principles facilitating functional processing of distributed data. The emphasis of our model is on simplicity, performance, and expressiveness. The primary means of communication is by passing functions (closures) to distributed, immutable data. To ensure safe and efficient distribution of closures, our model leverages both syntactic and type-based restrictions. We report on a prototype implementation in Scala. Finally, we present pre- liminary experimental results evaluating the performance impact of a static, type-based optimization of serialization.