A Novel Technique for Duplicate Detection and Classification of Bug Reports

Tao ZHANG  Byungjeong LEE  

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E97-D   No.7   pp.1756-1768
Publication Date: 2014/07/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.E97.D.1756
Type of Manuscript: PAPER
Category: Software Engineering
Keyword: 
bug report classification,  concept profile,  duplicate detection,  support vector machine,  software maintenance,  

Full Text: PDF>>
Buy this Article




Summary: 
Software products are increasingly complex, so it is becoming more difficult to find and correct bugs in large programs. Software developers rely on bug reports to fix bugs; thus, bug-tracking tools have been introduced to allow developers to upload, manage, and comment on bug reports to guide corrective software maintenance. However, the very high frequency of duplicate bug reports means that the triagers who help software developers in eliminating bugs must allocate large amounts of time and effort to the identification and analysis of these bug reports. In addition, classifying bug reports can help triagers arrange bugs in categories for the fixers who have more experience for resolving historical bugs in the same category. Unfortunately, due to a large number of submitted bug reports every day, the manual classification for these bug reports increases the triagers' workload. To resolve these problems, in this study, we develop a novel technique for automatic duplicate detection and classification of bug reports, which reduces the time and effort consumed by triagers for bug fixing. Our novel technique uses a support vector machine to check whether a new bug report is a duplicate. The concept profile is also used to classify the bug reports into related categories in a taxonomic tree. Finally, we conduct experiments that demonstrate the feasibility of our proposed approach using bug reports extracted from the large-scale open source project Mozilla.