Anonymization Technique Based on SGD Matrix Factorization

Tomoaki MIMOTO  Seira HIDANO  Shinsaku KIYOMOTO  Atsuko MIYAJI  

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E103-D   No.2   pp.299-308
Publication Date: 2020/02/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2019INP0013
Type of Manuscript: Special Section PAPER (Special Section on Security, Privacy, Anonymity and Trust in Cyberspace Computing and Communications)
Category: Cryptographic Techniques
Keyword: 
time-sequence data,  anonymization,  matrix factorization,  privacy and utility,  

Full Text: PDF(1.1MB)>>
Buy this Article




Summary: 
Time-sequence data is high dimensional and contains a lot of information, which can be utilized in various fields, such as insurance, finance, and advertising. Personal data including time-sequence data is converted to anonymized datasets, which need to strike a balance between both privacy and utility. In this paper, we consider low-rank matrix factorization as one of anonymization methods and evaluate its efficiency. We convert time-sequence datasets to matrices and evaluate both privacy and utility. The record IDs in time-sequence data are changed at regular intervals to reduce re-identification risk. However, since individuals tend to behave in a similar fashion over periods of time, there remains a risk of record linkage even if record IDs are different. Hence, we evaluate the re-identification and linkage risks as privacy risks of time-sequence data. Our experimental results show that matrix factorization is a viable anonymization method and it can achieve better utility than existing anonymization methods.