A study of BERT-based methods for formal citation identification of scientific data
Ning Yang (),
Zhiqiang Zhang and
Feihu Huang
Additional contact information
Ning Yang: Chinese Academy of Sciences
Zhiqiang Zhang: Chinese Academy of Sciences
Feihu Huang: Sichuan University
Scientometrics, 2023, vol. 128, issue 11, No 1, 5865-5881
Abstract:
Abstract A study on scientific data citation is crucial to promote data sharing and is the basis for the examination of scientific data measurement and analysis. To this end, it is necessary to identify and label data reference information. Currently, there are many supervised methods for entity recognition and relationship extraction of diseases, drugs, proteins, symptoms, etc., but they have not discussed the effectiveness of scientific data recognition. To fill this gap, the effectiveness of the classical machine learning model and the deep learning model on recognizing scientific data citation are discussed in this study. In experiments, this study took the full text of scientific and technical papers as the research object, conducted annotated citation classification based on rules and manual recognition of their references to form a dataset. The results of the empirical study showed that: (1) the methods used in this paper can achieve automatic identification and extraction of data citations and can address the problem of automating the construction of citation relationships between scientific and technical literature and scientific data; (2) the BERT-based models have the optimal effectiveness in the recognition task of scientific data citation, especially the BioBERT and SciBERT; (3) the full-text information has a crucial impact on the recognition results.
Keywords: BERT; Research data; Formal data citation; Identification methods (search for similar items in EconPapers)
Date: 2023
References: Add references at CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s11192-023-04833-z Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:scient:v:128:y:2023:i:11:d:10.1007_s11192-023-04833-z
Ordering information: This journal article can be ordered from
http://www.springer.com/economics/journal/11192
DOI: 10.1007/s11192-023-04833-z
Access Statistics for this article
Scientometrics is currently edited by Wolfgang Glänzel
More articles in Scientometrics from Springer, Akadémiai Kiadó
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().