Pfimbi: Accelerating Big Data Jobs Through Flow-Controlled Data Replication

Dzinamarira, Simbarashe; Dinu, Florin; Ng, T. S Eugene

Dzinamarira, Simbarashe; Dinu, Florin; Ng, T. S Eugene

2016

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

The performance of HDFS is critical to big data software stacks and has been at the forefront of recent efforts from the industry and the open source community. A key problem is the lack of flexibility in how data replication is performed. To address this problem, this paper presents Pfimbi, the first alternative to HDFS that supports both synchronous and flow- controlled asynchronous data replication. Pfimbi has numerous benefits: It accelerates jobs, exploits under-utilized storage I/O bandwidth, and supports hierarchical storage I/O bandwidth allocation policies. We demonstrate that for a job trace derived from a Facebook workload, Pfimbi improves the average job runtime by 18% and by up to 46% in the best case. We also demonstrate that flow control is crucial to fully exploiting the benefits of asynchronous replication; removing Pfimbi’s flow control mechanisms resulted in a 2.7x increase in job runtime.

Details

Title Pfimbi: Accelerating Big Data Jobs Through Flow-Controlled Data Replication

Author(s) Dzinamarira, Simbarashe ; Dinu, Florin ; Ng, T. S Eugene

Conference 32nd International Conference on Massive Storage Systems and Technology (MSST 2016), Santa Clara, May 2-6, 2016

Date 2016

Laboratories LABOS

Record Appears in Scientific production and competences > I&C - School of Computer and Communication Sciences > IINFCOM > LABOS - Operating Systems Laboratory
Peer-reviewed publications
Work outside EPFL
Conference Papers
Published

Record creation date 2016-05-25

Files

Abstract

Details

PDF