Software Development Effort Estimation from Unstructured Software Project Description by Sequence Models

Tachanun KANGWANTRAKOOL  Kobkrit VIRIYAYUDHAKORN  Thanaruk THEERAMUNKONG  

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E103-D   No.4   pp.739-747
Publication Date: 2020/04/01
Publicized: 2020/01/14
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2019IIP0014
Type of Manuscript: Special Section PAPER (Special Section on Intelligent Information and Communication Technology and its Applications to Creative Activity Support)
Category: 
Keyword: 
software effort estimation,  regression,  deep learning,  sequence model,  recurrent neural network (RNN),  gated recurrent units (GRU),  long short-term memory network (LSTM),  

Full Text: PDF>>
Buy this Article




Summary: 
Most existing methods of effort estimations in software development are manual, labor-intensive and subjective, resulting in overestimation with bidding fail, and underestimation with money loss. This paper investigates effectiveness of sequence models on estimating development effort, in the form of man-months, from software project data. Four architectures; (1) Average word-vector with Multi-layer Perceptron (MLP), (2) Average word-vector with Support Vector Regression (SVR), (3) Gated Recurrent Unit (GRU) sequence model, and (4) Long short-term memory (LSTM) sequence model are compared in terms of man-months difference. The approach is evaluated using two datasets; ISEM (1,573 English software project descriptions) and ISBSG (9,100 software projects data), where the former is a raw text and the latter is a structured data table explained the characteristic of a software project. The LSTM sequence model achieves the lowest and the second lowest mean absolute errors, which are 0.705 and 14.077 man-months for ISEM and ISBSG datasets respectively. The MLP model achieves the lowest mean absolute errors which is 14.069 for ISBSG datasets.