000181673 001__ 181673
000181673 005__ 20190316235505.0
000181673 037__ $$aCONF
000181673 245__ $$aJet: An Embedded DSL for High Performance Big Data Processing
000181673 269__ $$a2012
000181673 260__ $$c2012
000181673 336__ $$aConference Papers
000181673 520__ $$aCluster computing systems today impose a trade-off between generality, performance and productivity. Hadoop and Dryad force programmers to write low level programs that are tedious to compose but easy to optimize. Systems like Dryad/LINQ and Spark allow concise modeling of user programs but do not apply relational optimizations. Pig and Hive restrict the language to achieve relational optimizations, making complex programs hard to express without user extensions. However, these extensions are cumbersome to write and disallow program optimizations. We present a distributed batch data processing framework called Jet. Jet uses deep language embedding in Scala, multi-stage programming and explicit side effect tracking to analyze the structure of user programs. The analysis is used to apply projection insertion, which eliminates unused data, as well as code motion and operation fusion to highly optimize the performance critical path of the program. The language embedding and a high-level interface allow Jet programs to be both expressive, resembling regular Scala code, and optimized. Its modular design allows users to extend Jet with modules that produce good performing code. Through a modular code generation scheme, Jet can generate programs for both Spark and Hadoop. Compared with naïve implementations we achieve 143% speedups on Spark and 126% on Hadoop.
000181673 6531_ $$aDomain-specific Languages, Multi-stage Programming, MapReduce, Operation Fusion, Projection Insertion
000181673 700__ $$0(EPFLAUTH)222134$$aAckermann, Stefan$$g222134
000181673 700__ $$0243781$$aJovanovic, Vojin$$g202774
000181673 700__ $$0243345$$aRompf, Tiark$$g185682
000181673 700__ $$0241835$$aOdersky, Martin$$g126003
000181673 7112_ $$aInternational Workshop on End-to-end Management of Big Data (BigData 2012)
000181673 8564_ $$s362778$$uhttps://infoscience.epfl.ch/record/181673/files/paper.pdf$$yn/a$$zn/a
000181673 909C0 $$0252187$$pLAMP$$xU10409
000181673 909CO $$ooai:infoscience.tind.io:181673$$pconf$$pIC$$qGLOBAL_SET
000181673 917Z8 $$x185682
000181673 937__ $$aEPFL-CONF-181673
000181673 973__ $$aEPFL$$rREVIEWED$$sPUBLISHED
000181673 980__ $$aCONF