Improved Majority Filtering Algorithm for Cleaning Class Label Noise in Supervised Learning

Muhammad Ammar MALIK  Jae Young CHOI  Moonsoo KANG  Bumshik LEE  

IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences   Vol.E102-A   No.11   pp.1556-1559
Publication Date: 2019/11/01
Online ISSN: 1745-1337
DOI: 10.1587/transfun.E102.A.1556
Type of Manuscript: LETTER
Category: Digital Signal Processing
label noise,  support vectors,  mislabeled examples,  machine learning,  

Full Text: PDF(646.1KB)>>
Buy this Article

In most supervised learning problems, the labelling quality of datasets plays a paramount role in the learning of high-performance classifiers. The performance of a classifier can significantly be degraded if it is trained with mislabeled data. Therefore, identification of such examples from the dataset is of critical importance. In this study, we proposed an improved majority filtering algorithm, which utilized the ability of a support vector machine in terms of capturing potentially mislabeled examples as support vectors (SVs). The key technical contribution of our work, is that the base (or component) classifiers that construct the ensemble of classifiers are trained using non-SV examples, although at the time of testing, the examples captured as SVs were employed. An example can be tagged as mislabeled if the majority of the base classifiers incorrectly classifies the example. Experimental results confirmed that our algorithm not only showed high-level accuracy with higher F1 scores, for identifying the mislabeled examples, but was also significantly faster than the previous methods.