EconPapers    
Economics at your fingertips  
 

Functional structure identification of scientific documents in computer science

Wei Lu, Yong Huang, Yi Bu and Qikai Cheng ()
Additional contact information
Wei Lu: Wuhan University
Yong Huang: Wuhan University
Yi Bu: Indiana University
Qikai Cheng: Wuhan University

Scientometrics, 2018, vol. 115, issue 1, No 23, 463-486

Abstract: Abstract The increasing number of open-access full-text scientific documents promotes the transformation from metadata- to content-based studies, which is more detailed and semantic. Along with the benefits of ample data, the confused internal structure introduces great difficulties to data organization and analysis. Each unit in scientific documents has its own function in expressing authors’ research ideas, such as introducing motivations, describing methods, stating related work, and drawing conclusions; these could be used to identify functional structure of scientific documents. This paper firstly proposes a clustering method to generate domain-specific structures based on high-frequency section headers in scientific documents of a domain. To automatically identify the structure of scientific documents, we categorize scientific documents into three types: (1) strong-structure documents; (2) weak-structure documents; and (3) no-structure documents. We further divide the identification into three levels—section header-based identification, section content-based identification, and paragraph-based identification—corresponding to the three types of documents. Our experiments on documents in the field of computer science show that: (1) section header-based identification is the most direct and simplest method, but its accuracy is limited by unknown words in section headers; (2) section content-based identification is more stable and obtains good performance; and (3) paragraph-based identification is promising in identifying functions of no-structure documents. Additionally, we apply our methods to two tasks: academic search and keyword extraction. Both tasks demonstrate the effectiveness of functional structure.

Keywords: Functional structure; Text categorization; Academic retrieval; Keyword extraction (search for similar items in EconPapers)
Date: 2018
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (8)

Downloads: (external link)
http://link.springer.com/10.1007/s11192-018-2640-y Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:scient:v:115:y:2018:i:1:d:10.1007_s11192-018-2640-y

Ordering information: This journal article can be ordered from
http://www.springer.com/economics/journal/11192

DOI: 10.1007/s11192-018-2640-y

Access Statistics for this article

Scientometrics is currently edited by Wolfgang Glänzel

More articles in Scientometrics from Springer, Akadémiai Kiadó
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-20
Handle: RePEc:spr:scient:v:115:y:2018:i:1:d:10.1007_s11192-018-2640-y