For Full-Text PDF, please login, if you are a member of IEICE,|
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
Improved Majority Filtering Algorithm for Cleaning Class Label Noise in Supervised Learning
Muhammad Ammar MALIK Jae Young CHOI Moonsoo KANG Bumshik LEE
IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences
Publication Date: 2019/11/01
Online ISSN: 1745-1337
Type of Manuscript: LETTER
Category: Digital Signal Processing
label noise, support vectors, mislabeled examples, machine learning,
Full Text: PDF(646.1KB)>>
In most supervised learning problems, the labelling quality of datasets plays a paramount role in the learning of high-performance classifiers. The performance of a classifier can significantly be degraded if it is trained with mislabeled data. Therefore, identification of such examples from the dataset is of critical importance. In this study, we proposed an improved majority filtering algorithm, which utilized the ability of a support vector machine in terms of capturing potentially mislabeled examples as support vectors (SVs). The key technical contribution of our work, is that the base (or component) classifiers that construct the ensemble of classifiers are trained using non-SV examples, although at the time of testing, the examples captured as SVs were employed. An example can be tagged as mislabeled if the majority of the base classifiers incorrectly classifies the example. Experimental results confirmed that our algorithm not only showed high-level accuracy with higher F1 scores, for identifying the mislabeled examples, but was also significantly faster than the previous methods.