A UML Approximation of Three Chidamber-Kemerer Metrics and Their Ability to Predict Faulty Code across Software Projects


IEICE TRANSACTIONS on Information and Systems   Vol.E93-D   No.11   pp.3038-3050
Publication Date: 2010/11/01
Online ISSN: 1745-1361
Print ISSN: 0916-8532
Type of Manuscript: PAPER
Category: Software Engineering
UML metrics,  CK metrics,  fault-proneness of code,  logistic regression,  

Full Text: PDF(683KB)
>>Buy this Article

Design-complexity metrics, while measured from the code, have shown to be good predictors of fault-prone object-oriented programs. Some of the most often used metrics are the Chidamber and Kemerer metrics (CK). This paper discusses how to make early predictions of fault-prone object-oriented classes, using a UML approximation of three CK metrics. First, we present a simple approach to approximate Weighted Methods per Class (WMC), Response For Class (RFC) and Coupling Between Objects (CBO) CK metrics using UML collaboration diagrams. Then, we study the application of two data normalization techniques. Such study has a twofold purpose: to decrease the error approximation in measuring the mentioned CK metrics from UML diagrams, and to obtain a more similar data distribution of these metrics among software projects so that better prediction results are obtained when using the same prediction model across different software projects. Finally, we construct three prediction models with the source code of a package of an open source software project (Mylyn from Eclipse), and we test them with several other packages and three different small size software projects, using their UML and code metrics for comparison. The results of our empirical study lead us to conclude that the proposed UML RFC and UML CBO metrics can predict fault-proneness of code almost with the same accuracy as their respective code metrics do. The elimination of outliers and the normalization procedure used were of great utility, not only for enabling our UML metrics to predict fault-proneness of code using a code-based prediction model but also for improving the prediction results of our models across different software packages and projects.