For Full-Text PDF, please login, if you are a member of IEICE,|
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
The Performance Stability of Defect Prediction Models with Class Imbalance: An Empirical Study
Qiao YU Shujuan JIANG Yanmei ZHANG
IEICE TRANSACTIONS on Information and Systems
Publication Date: 2017/02/01
Online ISSN: 1745-1361
Type of Manuscript: PAPER
Category: Software Engineering
class imbalance, software defect prediction, prediction models, performance stability, imbalance ratio,
Full Text: PDF>>
Class imbalance has drawn much attention of researchers in software defect prediction. In practice, the performance of defect prediction models may be affected by the class imbalance problem. In this paper, we present an approach to evaluating the performance stability of defect prediction models on imbalanced datasets. First, random sampling is applied to convert the original imbalanced dataset into a set of new datasets with different levels of imbalance ratio. Second, typical prediction models are selected to make predictions on these new constructed datasets, and Coefficient of Variation (C·V) is used to evaluate the performance stability of different models. Finally, an empirical study is designed to evaluate the performance stability of six prediction models, which are widely used in software defect prediction. The results show that the performance of C4.5 is unstable on imbalanced datasets, and the performance of Naive Bayes and Random Forest are more stable than other models.