Rough-Mutual Feature Selection Based on Min-Uncertainty and Max-Certainty

Sombut FOITONG  Ouen PINNGERN  Boonwat ATTACHOO  

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E95-D   No.4   pp.970-981
Publication Date: 2012/04/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.E95.D.970
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Section on Knowledge-Based Software Engineering)
Category: 
Keyword: 
rough sets,  mutual information,  feature selection,  boundary region,  classification,  

Full Text: PDF>>
Buy this Article




Summary: 
Feature selection (FS) plays an important role in pattern recognition and machine learning. FS is applied to dimensionality reduction and its purpose is to select a subset of the original features of a data set which is rich in the most useful information. Most existing FS methods based on rough set theory focus on dependency function, which is based on lower approximation as for evaluating the goodness of a feature subset. However, by determining only information from a positive region but neglecting a boundary region, most relevant information could be invisible. This paper, the maximal lower approximation (Max-Certainty) – minimal boundary region (Min-Uncertainty) criterion, focuses on feature selection methods based on rough set and mutual information which use different values among the lower approximation information and the information contained in the boundary region. The use of this idea can result in higher predictive accuracy than those obtained using the measure based on the positive region (certainty region) alone. This demonstrates that much valuable information can be extracted by using this idea. Experimental results are illustrated for discrete, continuous, and microarray data and compared with other FS methods in terms of subset size and classification accuracy.