EconPapers    
Economics at your fingertips  
 

Data labeling through the centralities of co-reference networks improves the classification accuracy of scientific papers

Zheng Xie, Yiqin Lv, Yiping Song and Qi Wang

Journal of Informetrics, 2024, vol. 18, issue 2

Abstract: Labeled data are fed to learning models of classification tasks to help them learn to classify unlabeled data. Massive papers are hinged by citations to a few influential papers, much smaller than the total size, which, if labeled, would cause the spread of label information to the most of the papers. We utilized the co-reference relationship between papers since the references cited by a paper dataset usually cannot be contained by the dataset. We stated the problem of optimal paper labeling as the problem of picking a given fraction of nodes from a co-reference network to maximize the number of their neighbors, which is a submodular maximization problem with a cardinality constraint, NP-hard for general networks. We approximately solved it by picking nodes according to the ranks of specific network centralities. We further proved that labeling papers according to the rank of degree, the lowest-complexity centrality, can give a near-optimal solution with specific constraints on the maximum degree of co-reference network and labeling proportion. Experimental results show that our method brings a significant improvement in the accuracy of classification.

Keywords: Paper classification; Citation networks; Labeling strategy (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S1751157724000117
Full text for ScienceDirect subscribers only

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:eee:infome:v:18:y:2024:i:2:s1751157724000117

DOI: 10.1016/j.joi.2024.101498

Access Statistics for this article

Journal of Informetrics is currently edited by Leo Egghe

More articles in Journal of Informetrics from Elsevier
Bibliographic data for series maintained by Catherine Liu ().

 
Page updated 2025-03-19
Handle: RePEc:eee:infome:v:18:y:2024:i:2:s1751157724000117