Clouseau: Blockchain-based Data Integrity for HDFS Clusters
As the volume of produced data is exponentially increasing, companies tend to rely on distributed systems to meet the surging demand for storage capacity. With the business workflows becoming more and more complex, such systems often consist of or are accessed by multiple independent, untrusted entities, which need to interact with shared data. In such scenarios, the potential conflicts of interest incentivize malicious parties to act in a dishonest way and tamper the data to their own benefit. The decentralized nature of the systems renders verifiable data integrity a strenuous but necessary task: The various parties should be able to audit changes and detect tampering when it happens.
In this work, we focus on HDFS, the most common storage substrate for Big Data analytics. HDFS is vulnerable to malicious users and participating nodes and does not provide a trustful lineage mechanism, thus jeopardizing the integrity of stored data and the credibility of extracted insights. As a remedy, we present Clouseau, a blockchain-based system that provides verifiable integrity over HDFS, while it does not incur significant overhead at the critical path of read/write operations. During the demonstration, the attendees will have the chance to interact with Clouseau, corrupt data themselves, and witness how Clouseau detects malicious actions.
WOS:000687830800306
2021-01-01
978-1-7281-9184-3
Los Alamitos
IEEE International Conference on Data Engineering
2725
2728
REVIEWED
Event name | Event place | Event date |
ELECTR NETWORK | Apr 19-22, 2021 | |