EconPapers    
Economics at your fingertips  
 

unarXive: a large scholarly data set with publications’ full-text, annotated in-text citations, and links to metadata

Tarek Saier () and Michael Färber ()
Additional contact information
Tarek Saier: Karlsruhe Institute of Technology (KIT)
Michael Färber: Karlsruhe Institute of Technology (KIT)

Scientometrics, 2020, vol. 125, issue 3, No 53, 3085-3108

Abstract: Abstract In recent years, scholarly data sets have been used for various purposes, such as paper recommendation, citation recommendation, citation context analysis, and citation context-based document summarization. The evaluation of approaches to such tasks and their applicability in real-world scenarios heavily depend on the used data set. However, existing scholarly data sets are limited in several regards. In this paper, we propose a new data set based on all publications from all scientific disciplines available on arXiv.org. Apart from providing the papers’ plain text, in-text citations were annotated via global identifiers. Furthermore, citing and cited publications were linked to the Microsoft Academic Graph, providing access to rich metadata. Our data set consists of over one million documents and 29.2 million citation contexts. The data set, which is made freely available for research purposes, not only can enhance the future evaluation of research paper-based and citation context-based approaches, but also serve as a basis for new ways to analyze in-text citations, as we show prototypically in this article.

Keywords: Scholarly data; Citations; arXiv.org; Digital libraries; Data set (search for similar items in EconPapers)
Date: 2020
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (3)

Downloads: (external link)
http://link.springer.com/10.1007/s11192-020-03382-z Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:scient:v:125:y:2020:i:3:d:10.1007_s11192-020-03382-z

Ordering information: This journal article can be ordered from
http://www.springer.com/economics/journal/11192

DOI: 10.1007/s11192-020-03382-z

Access Statistics for this article

Scientometrics is currently edited by Wolfgang Glänzel

More articles in Scientometrics from Springer, Akadémiai Kiadó
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-20
Handle: RePEc:spr:scient:v:125:y:2020:i:3:d:10.1007_s11192-020-03382-z