Automatic Identification of Research Articles Containing Data Usage Statements
Qiuzi Zhang,
Wei Lu,
Yunhan Yang,
Haihua Chen and
Jiangping Chen
Chapter 4 in Knowledge Discovery and Data Design Innovation:Proceedings of the International Conference on Knowledge Management (ICKM 2017), 2017, pp 67-87 from World Scientific Publishing Co. Pte. Ltd.
Abstract:
Modern scientific research is characterized with sharing datasets and reusing data for developing new models and theories. This paper describes a study to identify research articles with data use and reuse information. Applying a bootstrapping-based unsupervised training strategy, we were able to develop text patterns automatically out of a large training collection of research articles. These patterns were then used to distinguish articles with data use and reuse from those without data usage. Our experiments using Computer Science literature showed that the identification could achieve more than 85% pattern extensibility. We also demonstrate how the results of the identification could be utilized to gain insights on data sharing and reuse in a scientific field.
Keywords: Knowledge Discovery; Big Data; Data Science; Data Analytics; Innovation (search for similar items in EconPapers)
JEL-codes: O30 (search for similar items in EconPapers)
Date: 2017
References: Add references at CitEc
Citations:
Downloads: (external link)
https://www.worldscientific.com/doi/pdf/10.1142/9789813234482_0004 (application/pdf)
https://www.worldscientific.com/doi/abs/10.1142/9789813234482_0004 (text/html)
Ebook Access is available upon purchase.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:wsi:wschap:9789813234482_0004
Ordering information: This item can be ordered from
Access Statistics for this chapter
More chapters in World Scientific Book Chapters from World Scientific Publishing Co. Pte. Ltd.
Bibliographic data for series maintained by Tai Tone Lim ().