Manage the Tradeoff in Data Sanitization

Peng CHENG  Chun-Wei LIN  Jeng-Shyang PAN  Ivan LEE  

IEICE TRANSACTIONS on Information and Systems   Vol.E98-D   No.10   pp.1856-1860
Publication Date: 2015/10/01
Publicized: 2015/07/14
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2014EDL8250
Type of Manuscript: LETTER
Category: Artificial Intelligence, Data Mining
data sanitization,  privacy,  frequent itemset mining,  evolutionary multi-objective optimization,  side effects,  

Full Text: PDF(335.2KB)>>
Buy this Article

Sharing data might bring the risk of disclosing the sensitive knowledge in it. Usually, the data owner may choose to sanitize data by modifying some items in it to hide sensitive knowledge prior to sharing. This paper focuses on protecting sensitive knowledge in the form of frequent itemsets by data sanitization. The sanitization process may result in side effects, i.e., the data distortion and the damage to the non-sensitive frequent itemsets. How to minimize these side effects is a challenging problem faced by the research community. Actually, there is a trade-off when trying to minimize both side effects simultaneously. In view of this, we propose a data sanitization method based on evolutionary multi-objective optimization (EMO). This method can hide specified sensitive itemsets completely while minimizing the accompanying side effects. Experiments on real datasets show that the proposed approach is very effective in performing the hiding task with fewer damage to the original data and non-sensitive knowledge.