Files

Abstract

Graph processing systems are used in a wide variety of fields, ranging from biology to social networks. Algorithms to mine graphs incur many random accesses, and the sparse nature of the graphs of interest, exacerbates this. As DRAM sustains high bandwidth in the presence of random accesses, traditionally, it was the chosen medium to store and process graphs from. Many in-memory systems were presented at top tier conferences, and every system compared to the previous in algorithm execution time. What is not clearly evaluated are the overheads of pre-processing the input in order to support particular optimisations. More critical, for a systems designer, it is very hard to reason about what are the fundamental techniques that lead to performance gains. We implement the proposed optimisations of state-of-the-art systems within one system to answer this question. We found that the pre-processing cost often dominates the cost of algorithm execution, calling into question the benefits of proposed algorithmic optimisations that rely on extensive pre-processing. The design of a system has to be carefully guided by the characteristics of the algorithms, graphs and hardware. Furthermore, in-memory systems are often limited by the amount of available memory on a single machine. When the graph cannot fit in the available DRAM, many systems scale up to secondary storage on one machine, or in the cloud. As storage is cheaper than memory, they trade off performance for more space, at a lower cost. Emerging non-volatile technologies such as 3D XPoint, offer a new opportunity for bridging this performance gap. Designing a system that fully utilises these storage devices is not straightforward. Traditional I/O optimisations, designed for SSDs, underutilise the device, and systems end up being CPU bound on fast storage. By removing the dedicated I/O layers, and combining pre-processing approaches for in-memory and out-of-core systems, we are able to switch graph representations to benefit a particular algorithm. This improves the end-to-end execution time up to 2x. However, in the presence of updates to the graph structure, the data structures used by static algorithms need to be re-created on every update, and the algorithm executed from scratch. The high latency of this process is not acceptable for critical applications such as fraud detection. We address this problem by developing two systems to compute on evolving graphs, using an update friendly graph representation. Each system targets a different group of applications: graph analytics and graph pattern mining (GPM). For graph analytics we trade sub-millisecond latencies for consistency. For GPM, we use re-computation and backtracking to handle millions of updates per second, outperforming state of the art by up to 5x.

Details

Actions

Preview