Acceleration of graph pattern mining and applications to financial crime
Various forms of real-world data, such as social, financial, and biological networks, can be
represented using graphs. An efficient method of analysing this type of data is to extract
subgraph patterns, such as cliques, cycles, and motifs, from graphs. For instance, finding
cycles in financial graphs enables the detection of financial crimes such as money laundering
and circular stock trading. In addition, extracting cliques from social network graphs enables
the detection of communities and could help predict the spread of epidemics. However,
extracting such patterns can be time-consuming, especially in larger graphs, because the
number of patterns can grow exponentially with the graph size. Therefore, fast and scalable
parallel algorithms are required to make the enumeration of these subgraph patterns tractable
for real-world graphs.
This thesis presents fast parallel algorithms for the enumeration of maximal cliques and simple
cycles. We focus on accelerating the asymptotically-optimal sequential algorithms for
enumerating the aforementioned patterns by parallelising them on manycore CPUs. To enable
scalable parallelisation of clique and cycle enumeration algorithms, the algorithms presented
in this thesis rely on fine-grained parallelisation, in which recursive calls are executed independently
of each other using several software threads. However, simply applying the fine-grained
parallelisation method to the aforementioned asymptotically-optimal algorithms leads to
suboptimal solutions. Parallelising maximal clique enumeration using this method results in
increased overhead caused by multithreaded memory management and task scheduling, as
well as increased dynamic memory usage. In addition, the asymptotically-optimal algorithms
for simple cycle enumeration rely on strict depth-first traversal of their recursion tree, making
the fine-grained parallelisation of these algorithms challenging. This thesis addresses these
problems and presents parallel algorithms that lead to an almost linear scaling of performance
with the number of CPU cores utilised. As a result, the parallel algorithms presented in this
thesis are able to achieve an order of magnitude speedup compared to the state-of-the-art
parallel algorithms when executed on manycore CPU systems.
To demonstrate the applicability of our accelerated algorithms, this thesis presents the Graph
Feature Preprocessor library, which can be used to detect financial crime. This library expands
the feature set of financial transactions by enumerating well-known money laundering and
financial fraud subgraph patterns, such as simple cycles, in financial transaction graphs. When
used in combination with gradient-boosting-based machine learning models, the expanded
feature set produced by the library enables significant improvements in prediction accuracy
for the problems of money laundering and phishing detection. Furthermore, the efficiency of
the subgraph pattern mining algorithms presented in this thesis enables this library to operate
in real time.
As financial fraud schemes become more complex, fast algorithms that can detect suspicious
behaviour are required. This thesis demonstrates that the parallel graph pattern mining algorithms
introduced here can be used to enable fast and accurate detection of such suspicious
behaviour.
EPFL_TH9938.pdf
n/a
openaccess
copyright
4.21 MB
Adobe PDF
3dd17499e2d333f664d75ee49cad8bc7