Parallelizing Machine Learning- Functionally: A Framework and Abstractions for Parallel Graph Processing

Haller, Philipp; Miller, Heather

conference paper not in proceedings

Haller, Philipp

•

Miller, Heather

2011

2nd Annual Scala Workshop

Implementing machine learning algorithms for large data, such as the Web graph and social networks, is challenging. Even though much research has focused on making sequential algorithms more scalable, their running times continue to be prohibitively long. Meanwhile, parallelization remains a formidable challenge for this class of problems, despite frameworks like MapReduce which hide much of the associated complexity. We present a framework for implementing parallel and distributed machine learning algorithms on large graphs, flexibly, through the use of functional programming abstractions. Our aim is a system that allows researchers and practitioners to quickly and easily implement (and experiment with) their algorithms in a parallel or distributed setting. We introduce functional combinators for the flexible composition of parallel, aggregation, and sequential steps. To the best of our knowledge, our system is the first to avoid inversion of control in a (bulk) synchronous parallel model.

Name

scalawksp11.pdf

Type

Postprint

Access type

openaccess

Size

170.64 KB

Format

Adobe PDF

Checksum (MD5)

48fdfbcda3abd3dd0e57706f708f3071