Permutation-based Sequential Pattern Hiding

Sequence data are increasingly shared to enable mining applications, in various domains such as marketing, telecommunications, and healthcare. This, however, may expose sensitive sequential patterns, which lead to intrusive inferences about individuals or leak confidential information about organizations. This paper presents the first permutation-based approach to prevent this threat. Our approach hides sensitive patterns by replacing them with carefully selected permutations that avoid changes in the set of frequent nonsensitive patterns (side-effects) and in the ordering information of sequences (distortion). By doing so, it retains data utility in sequence mining and tasks based on itemset properties, as permutation preserves the support of items, unlike deletion, which is used in existing works. To realize our approach, we develop an efficient and effective algorithm for generating permutations with minimal side-effects and distortion. This algorithm also avoids implausible symbol orderings that may exist in certain applications. In addition, we propose a method to hide sensitive patterns from a sequence dataset. Extensive experiments verify that our method allows significantly more accurate data analysis than the state-of-the-art approach.

Xiong, H.
Karypis, G.
Thuraisingham, B.
Cook, D.
Wu, X.
Published in:
2013 Ieee 13Th International Conference On Data Mining (Icdm), 241-250
Presented at:
IEEE 13th International Conference on Data Mining (ICDM)
New York, Ieee

 Record created 2014-06-02, last modified 2018-01-28

Rate this document:

Rate this document:
(Not yet reviewed)