
For FullText PDF, please login, if you are a member of IEICE,
or go to Pay Per View on menu list, if you are a nonmember of IEICE.

Feature Selection by Computing Mutual Information Based on Partitions
Chengxiang YIN Hongjun ZHANG Rui ZHANG Zilin ZENG Xiuli QI Yuntian FENG
Publication
IEICE TRANSACTIONS on Information and Systems
Vol.E101D
No.2
pp.437446 Publication Date: 2018/02/01
Online ISSN: 17451361 Type of Manuscript: PAPER Category: Artificial Intelligence, Data Mining Keyword: partition, feature selection, feature interaction, FSMIP,
Full Text: PDF(4.7MB) >>Buy this Article
Summary:
The main idea of filter methods in feature selection is constructing a featureassessing criterion and searching for feature subset that optimizes the criterion. The primary principle of designing such criterion is to capture the relevance between feature subset and the class as precisely as possible. It would be difficult to compute the relevance directly due to the computation complexity when the size of feature subset grows. As a result, researchers adopt approximate strategies to measure relevance. Though these strategies worked well in some applications, they suffer from three problems: parameter determination problem, the neglect of feature interaction information and overestimation of some features. We propose a new feature selection algorithm that could compute mutual information between feature subset and the class directly without deteriorating computation complexity based on the computation of partitions. In light of the specific properties of mutual information and partitions, we propose a pruning rule and a stopping criterion to accelerate the searching speed. To evaluate the effectiveness of the proposed algorithm, we compare our algorithm to the other five algorithms in terms of the number of selected features and the classification accuracies on three classifiers. The results on the six synthetic datasets show that our algorithm performs well in capturing interaction information. The results on the thirteen real world datasets show that our algorithm selects less yet better feature subset.

