EconPapers    
Economics at your fingertips  
 

Automatic identification of cited text spans: a multi-classifier approach over imbalanced dataset

Shutian Ma (), Jin Xu () and Chengzhi Zhang ()
Additional contact information
Shutian Ma: Nanjing University of Science and Technology
Jin Xu: Nanjing University of Science and Technology
Chengzhi Zhang: Nanjing University of Science and Technology

Scientometrics, 2018, vol. 116, issue 2, No 33, 1303-1330

Abstract: Abstract Recently, a new form of structured summary on scientific papers is explored by grouping cited text spans from the reference paper. Its primary goal is to generate summaries based on the cited paper itself. Previously, traditional scientific summarization focused on citation-based methods by aggregating all citances that cite one unique paper without doing content-based citation analysis, while sometimes citations might differ between researchers or time slots. By investigating original text spans where scholars cited, the new method can reflect exact contributions of reference papers more. Therefore, how to identify cited text spans accurately becomes the first important problem to solve. Generally, it can be converted into finding the sentences in reference paper that is more similar with citation sentences. Taking it as a classification task, we investigate the potential of four actions to improve identification performance. Firstly, feature selections are conducted carefully according to multi-classifiers. Secondly, we apply sampling-based algorithms to preprocess class-imbalanced datasets. Since we integrated results via a weighted voting system, the third action is tuning parameters like, voting weights for multi-classifiers integration or running settings to see if we can improve performance further. Evaluation results show effectiveness of each action and demonstrate that researchers can take these actions for more accurate cited text spans identification when doing scientific summarization.

Keywords: Natural language processing; Cited text span; Reference span identification; Multiclassifier; Voting system (search for similar items in EconPapers)
Date: 2018
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (9)

Downloads: (external link)
http://link.springer.com/10.1007/s11192-018-2754-2 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:scient:v:116:y:2018:i:2:d:10.1007_s11192-018-2754-2

Ordering information: This journal article can be ordered from
http://www.springer.com/economics/journal/11192

DOI: 10.1007/s11192-018-2754-2

Access Statistics for this article

Scientometrics is currently edited by Wolfgang Glänzel

More articles in Scientometrics from Springer, Akadémiai Kiadó
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-20
Handle: RePEc:spr:scient:v:116:y:2018:i:2:d:10.1007_s11192-018-2754-2