EconPapers    
Economics at your fingertips  
 

Softcite dataset: A dataset of software mentions in biomedical and economic research publications

Caifan Du, Johanna Cohoon, Patrice Lopez and James Howison

Journal of the Association for Information Science & Technology, 2021, vol. 72, issue 7, 870-884

Abstract: Software contributions to academic research are relatively invisible, especially to the formalized scholarly reputation system based on bibliometrics. In this article, we introduce a gold‐standard dataset of software mentions from the manual annotation of 4,971 academic PDFs in biomedicine and economics. The dataset is intended to be used for automatic extraction of software mentions from PDF format research publications by supervised learning at scale. We provide a description of the dataset and an extended discussion of its creation process, including improved text conversion of academic PDFs. Finally, we reflect on our challenges and lessons learned during the dataset creation, in hope of encouraging more discussion about creating datasets for machine learning use.

Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (3)

Downloads: (external link)
https://doi.org/10.1002/asi.24454

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bla:jinfst:v:72:y:2021:i:7:p:870-884

Ordering information: This journal article can be ordered from
http://www.blackwell ... bs.asp?ref=2330-1635

Access Statistics for this article

More articles in Journal of the Association for Information Science & Technology from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().

 
Page updated 2025-03-19
Handle: RePEc:bla:jinfst:v:72:y:2021:i:7:p:870-884