Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Jet: An Embedded DSL for High Performance Big Data Processing
 
conference paper not in proceedings

Jet: An Embedded DSL for High Performance Big Data Processing

Ackermann, Stefan  
•
Jovanovic, Vojin  
•
Rompf, Tiark  
Show more
2012
International Workshop on End-to-end Management of Big Data (BigData 2012)

Cluster computing systems today impose a trade-off between generality, performance and productivity. Hadoop and Dryad force programmers to write low level programs that are tedious to compose but easy to optimize. Systems like Dryad/LINQ and Spark allow concise modeling of user programs but do not apply relational optimizations. Pig and Hive restrict the language to achieve relational optimizations, making complex programs hard to express without user extensions. However, these extensions are cumbersome to write and disallow program optimizations. We present a distributed batch data processing framework called Jet. Jet uses deep language embedding in Scala, multi-stage programming and explicit side effect tracking to analyze the structure of user programs. The analysis is used to apply projection insertion, which eliminates unused data, as well as code motion and operation fusion to highly optimize the performance critical path of the program. The language embedding and a high-level interface allow Jet programs to be both expressive, resembling regular Scala code, and optimized. Its modular design allows users to extend Jet with modules that produce good performing code. Through a modular code generation scheme, Jet can generate programs for both Spark and Hadoop. Compared with naïve implementations we achieve 143% speedups on Spark and 126% on Hadoop.

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

paper.pdf

Access type

openaccess

Size

354.28 KB

Format

Adobe PDF

Checksum (MD5)

9e1056201d36ad82111323753af7c2a2

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés