Semi-Supervised Feature Selection with Universum Based on Linked Social Media Data

Junyang QIU  Yibing WANG  Zhisong PAN  Bo JIA  

IEICE TRANSACTIONS on Information and Systems   Vol.E97-D   No.9   pp.2522-2525
Publication Date: 2014/09/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2014EDL8033
Type of Manuscript: LETTER
Category: Pattern Recognition
universum,  feature selection,  social media,  semi-supervised learning,  

Full Text: PDF>>
Buy this Article

Independent and identically distributed (i.i.d) assumptions are commonly used in the machine learning community. However, social media data violate this assumption due to the linkages. Meanwhile, with the variety of data, there exist many samples, i.e., Universum, that do not belong to either class of interest. These characteristics pose great challenges to dealing with social media data. In this letter, we fully take advantage of Universum samples to enable the model to be more discriminative. In addition, the linkages are also taken into consideration in the means of social dimensions. To this end, we propose the algorithm Semi-Supervised Linked samples Feature Selection with Universum (U-SSLFS) to integrate the linking information and Universum simultaneously to select robust features. The empirical study shows that U-SSLFS outperforms state-of-the-art algorithms on the Flickr and BlogCatalog.