Prokopec, AleksandarPetrashko, DmitryOdersky, Martin2014-02-132014-02-132014-02-132014https://infoscience.epfl.ch/handle/20.500.14299/100603With the rise of multicores, there is a trend of supporting data-parallel collection operations in general purpose programming languages. These operations are highly parametric, incurring abstraction performance penalties. Furthermore, data-parallel operations must scale when applied to irregular workloads. Work-stealing is a proven technique for load balancing irregular workloads, but general purpose work-stealing also suffers abstraction penalties. We present a generic data-parallel collections design based on work-stealing for shared-memory architectures that overcomes abstraction penalties through callsite specialization of data-parallel operation instances. Moreover, we introduce \textit{work-stealing iterators} that allow fine-grained and efficient work-stealing for particular data-structures. By eliminating abstraction penalties and making work-stealing data-structure-aware we achieve up to 60x better performance compared to JVM-based approaches and 3x speedups compared to tools such as Intel TBB.data parallelismconc-listswork-stealing collectionscallsite specializationparallel hash-tablesparallel arraysabstraction penaltyload balancingdomain-specific work-stealingOn Lock-Free Work-stealing Iterators for Parallel Data Structurestext::report