Sort vs. Hash Join Revisited for Near-Memory Execution
Data movement between memory and CPU is a well-known energy bottleneck for analytics. Near-Memory Processing (NMP) is a promising approach for eliminating this bottleneck by shifting the bulk of the computation toward memory arrays in emerging stacked DRAM chips. Recent work in this space has been limited to regular computations that can be localized to a single DRAM partition. This paper examines a Join workload, which is fundamental to analytics and is characterized by irregular memory access patterns. We consider several join algorithms and show that while near-data execution can improve both energy-efficiency and performance, effective NMP algorithms must consider locality, access granularity, and microarchitecture of the stacked memory devices.