On the effectiveness of network metrics on key class prediction: An empirical study

Zhou, Shiyuan; Wu, Wei; Wang, Jiale; Liu, Hongbing; Yuan, Chenxiang

On the effectiveness of network metrics on key class prediction: An empirical study

Shiyuan Zhou, Wei Wu, Jiale Wang, Hongbing Liu and Chenxiang Yuan

PLOS ONE, 2025, vol. 20, issue 10, 1-25

Abstract: Key classes are the most important classes in a software system, which provide an excellent foundation for developers—especially those new to the field—to understand unfamiliar software systems. In the past decade, several key class prediction (KCP) approaches have been proposed. They used design metrics extracted from source code and unweighted network metrics computed on class coupling networks as features and built machine-learning models to predict whether a class is a key class or not. However, previous studies mainly focused on improving the performance of KCP models in the within-project (i.e., KCP in the same project) context, and the network metrics they used are unweighted and inaccurate, as they are computed on unweighted and incomplete class coupling networks. These limitations lead to a lack of thorough evaluation of the effectiveness of network metrics for KCP, especially in the cross-project (KCP across diverse projects) context, which in turn results in uncertainty about how to choose suitable metrics as features when building KCP models. To fill this gap, in this paper, we thoroughly evaluate the effectiveness of network metrics for KCP. Specifically, we build weighted and more complete class coupling networks for software, and introduce a set of weighted network metrics to characterize class complexity. Then, we build different KCP models using the Random Forest learner and the Naive Bayes model for two KCP contexts (i.e., within-project and cross-project), respectively, with design metrics, unweighted/weighted network metrics, and their combinations being features. Finally, through an empirical study on 18 open-source Java projects, we thoroughly investigate the relative effectiveness of network metrics over design metrics across the two KCP contexts. Our results suggest that when building KCP models, to achieve better performance, researchers and practitioners should consider using unweighted (or weighted) network metrics alone or along with design metrics in the within-project KCP context, using design metrics alone or along with unweighted (or weighted) network metrics in the cross-project KCP context, and using unweighted (or weighted) network metrics along with design metrics across the two KCP contexts.

Date: 2025
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0334408 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 34408&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0334408

DOI: 10.1371/journal.pone.0334408

Access Statistics for this article

More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().