The Performance Stability of Defect Prediction Models with Class Imbalance: An Empirical Study

Qiao YU  Shujuan JIANG  Yanmei ZHANG  

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E100-D   No.2   pp.265-272
Publication Date: 2017/02/01
Online ISSN: 1745-1361
Type of Manuscript: PAPER
Category: Software Engineering
Keyword: 
class imbalance,  software defect prediction,  prediction models,  performance stability,  imbalance ratio,  

Full Text: PDF(645KB)
>>Buy this Article


Summary: 
Class imbalance has drawn much attention of researchers in software defect prediction. In practice, the performance of defect prediction models may be affected by the class imbalance problem. In this paper, we present an approach to evaluating the performance stability of defect prediction models on imbalanced datasets. First, random sampling is applied to convert the original imbalanced dataset into a set of new datasets with different levels of imbalance ratio. Second, typical prediction models are selected to make predictions on these new constructed datasets, and Coefficient of Variation (C·V) is used to evaluate the performance stability of different models. Finally, an empirical study is designed to evaluate the performance stability of six prediction models, which are widely used in software defect prediction. The results show that the performance of C4.5 is unstable on imbalanced datasets, and the performance of Naive Bayes and Random Forest are more stable than other models.