Investigating and Projecting Population Structures in Open Source Software Projects: A Case Study of Projects in GitHub

Saya ONOUE  Hideaki HATA  Akito MONDEN  Kenichi MATSUMOTO  

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E99-D   No.5   pp.1304-1315
Publication Date: 2016/05/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2015EDP7363
Type of Manuscript: PAPER
Category: Software Engineering
Keyword: 
OSS,  software development communities,  software population pyramids,  demography,  

Full Text: PDF(838.4KB)
>>Buy this Article | Errata[Uploaded on July 1,2016]


Summary: 
GitHub is a developers' social networking service that hosts a great number of open source software (OSS) projects. Although some of the hosted projects are growing and have many developers, most projects are organized by a few developers and face difficulties in terms of sustainability. OSS projects depend mainly on volunteer developers, and attracting and retaining these volunteers are major concerns of the project stakeholders. To investigate the population structures of OSS development communities in detail and conduct software analytics to obtain actionable information, we apply a demographic approach. Demography is the scientific study of population and seeks to identify the levels and trends in the size and components of a population. This paper presents a case study, investigating the characteristics of the population structures of OSS projects on GitHub, and shows population projections generated with the well-known cohort component method. We found that there are four types of population structures in OSS development communities in terms of experiences and contributions. In addition, we projected the future population accurately using a cohort component population projection method. This method predicts a population of the next period using a survival rate calculated from past population. To the best of our knowledge, this is the first study that applied demography to the field of OSS research. Our approach addressing OSS-related problems based on demography will bring new insights, since studying population is novel in OSS research. Understanding current and future structures of OSS projects can help practitioners to monitor a project, gain awareness of what is happening, manage risks, and evaluate past decisions.