For Full-Text PDF, please login, if you are a member of IEICE,|
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
Manage the Tradeoff in Data Sanitization
Peng CHENG Chun-Wei LIN Jeng-Shyang PAN Ivan LEE
IEICE TRANSACTIONS on Information and Systems
Publication Date: 2015/10/01
Online ISSN: 1745-1361
Type of Manuscript: LETTER
Category: Artificial Intelligence, Data Mining
data sanitization, privacy, frequent itemset mining, evolutionary multi-objective optimization, side effects,
Full Text: PDF(335.2KB)>>
Sharing data might bring the risk of disclosing the sensitive knowledge in it. Usually, the data owner may choose to sanitize data by modifying some items in it to hide sensitive knowledge prior to sharing. This paper focuses on protecting sensitive knowledge in the form of frequent itemsets by data sanitization. The sanitization process may result in side effects, i.e., the data distortion and the damage to the non-sensitive frequent itemsets. How to minimize these side effects is a challenging problem faced by the research community. Actually, there is a trade-off when trying to minimize both side effects simultaneously. In view of this, we propose a data sanitization method based on evolutionary multi-objective optimization (EMO). This method can hide specified sensitive itemsets completely while minimizing the accompanying side effects. Experiments on real datasets show that the proposed approach is very effective in performing the hiding task with fewer damage to the original data and non-sensitive knowledge.