Accurate Library Recommendation Using Combining Collaborative Filtering and Topic Model for Mobile Development

Xiaoqiong ZHAO  Shanping LI  Huan YU  Ye WANG  Weiwei QIU  

IEICE TRANSACTIONS on Information and Systems   Vol.E102-D   No.3   pp.522-536
Publication Date: 2019/03/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2018EDP7227
Type of Manuscript: PAPER
Category: Software Engineering
library recommendation,  text mining,  latent dirichlet allocation,  software engineering,  

Full Text: PDF(7.4MB)
>>Buy this Article

Background: The applying of third-party libraries is an integral part of many applications. But the libraries choosing is time-consuming even for experienced developers. The automated recommendation system for libraries recommendation is widely researched to help developers to choose libraries. Aim: from software engineering aspect, our research aims to give developers a reliable recommended list of third-party libraries at the early phase of software development lifecycle to help them build their development environment faster; and from technical aspect, our research aims to build a generalizable recommendation system framework which combines collaborative filtering and topic modeling techniques, in order to improve the performance of libraries recommendation significantly. Our works on this research: 1) we design a hybrid methodology to combine collaborative filtering and LDA text mining technology; 2) we build a recommendation system framework successfully based on the above hybrid methodology; 3) we make a well-designed experiment to validate the methodology and framework which use the data of 1,013 mobile application projects; 4) we do the evaluation for the result of the experiment. Conclusions: 1) hybrid methodology with collaborative filtering and LDA can improve the performance of libraries recommendation significantly; 2) based on the hybrid methodology, the framework works very well on the libraries recommendation for helping developers' libraries choosing. Further research is necessary to improve the performance of the libraries recommendation including: 1) use more accurate NLP technologies improve the correlation analysis; 2) try other similarity calculation methodology for collaborative filtering to rise the accuracy; 3) on this research, we just bring the time-series approach to the framework and make an experiment as comparative trial, the result shows that the performance improves continuously, so in further research we plan to use time-series data-mining as the basic methodology to update the framework.