Evaluation of Software Fault Prediction Models Considering Faultless Cases

Yukasa MURAKAMI  Masateru TSUNODA  Koji TODA  

IEICE TRANSACTIONS on Information and Systems   Vol.E103-D   No.6   pp.1319-1327
Publication Date: 2020/06/01
Publicized: 2020/03/09
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2019KBP0019
Type of Manuscript: Special Section PAPER (Special Section on Knowledge-Based Software Engineering)
defect prediction,  Tobit model,  Poisson regression,  ensemble method,  

Full Text: PDF(1.2MB)>>
Buy this Article

To enhance the prediction accuracy of the number of faults, many studies proposed various prediction models. The model is built using a dataset collected in past projects, and the number of faults is predicted using the model and the data of the current project. Datasets sometimes have many data points where the dependent variable, i.e., the number of faults is zero. When a multiple linear regression model is made using the dataset, the model may not be built properly. To avoid the problem, the Tobit model is considered to be effective when predicting software faults. The model assumes that the range of a dependent variable is limited and the model is built based on the assumption. Similar to the Tobit model, the Poisson regression model assumes there are many data points whose value is zero on the dependent variable. Also, log-transformation is sometimes applied to enhance the accuracy of the model. Additionally, ensemble methods are effective to enhance prediction accuracy of the models. We evaluated the prediction accuracy of the methods separately, when the number of faults is zero and not zero. In the experiment, our proposed ensemble method showed the highest accuracy, and Pred25 was 21% when the number of faults was not zero, and it was 45% when the number was zero.