Tools and Frameworks for Big Learning in Scala: Leveraging the Language for High Productivity and Performance

Implementing machine learning algorithms for large data, such as the Web graph and social networks, is challenging. Even though much research has focused on making sequential algorithms more scalable, their running times continue to be prohibitively long. Meanwhile, parallelization remains a formidable challenge for this class of problems, despite frameworks like MapReduce which hide much of the associated complexity. We present three ongoing efforts within our team, previously presented at venues in other fields, which aim to make it easier for machine learning researchers and practitioners alike to quickly implement and experiment with their algorithms in a parallel or distributed setting. Furthermore, we hope to highlight some of the language features unique to the Scala programming language in the treatment of our frameworks, in an effort to show how these features can be used to produce efficient and correct parallel systems more easily than ever before.

Presented at:
NIPS 2011 Workshop on Parallel and Large-Scale Machine Learning (BigLearn), Sierra Nevada, Spain, December 16-17, 2011

 Record created 2011-11-07, last modified 2019-03-16

Download fulltext

Rate this document:

Rate this document:
(Not yet reviewed)