Data stream statistics over sliding windows: How to summarize 150 Million updates per second on a single node
Traditional data management systems map information using centralized and static data structures. Modern applications need to process in real time datasets much larger than system memory. To achieve this, they use dynamic entities that are updated with streaming input data over a sliding window. For efficient and high performance processing, approximate sketch synopses of input streams have been proposed as effective means for the summarization of streaming data over large sliding windows with probabilistic accuracy guarantees. This work presents a system-level solution to accelerate the Exponential Count-Min (ECM) sketch algorithm on reconfigurable technology. Different reconfigurable architectures for the sketch structure that correspond to different cost and performance tradeoffs are presented. We map the proposed system-level ECM sketch architectures to a high-end modern HPC platform to achieve guaranteed and best-effort update rates up to 150 and 180 million tuples per second respectively. We compare the performance of the implemented system against the best optimized multi-thread software alternative and show that our scalable full-system accelerators outperform software solutions by 5-7.5x for Virtex6 devices and in excess of 10x for current Ultrascale devices.
fpl19.pdf
openaccess
473.86 KB
Adobe PDF
2e39a78e42f2021d8f44cb01c1744cd9