Affiliation disambiguation for constructing semantic digital libraries
Yong Jiang,
Hai‐Tao Zheng,
Xinmin Wang,
Binggan Lu and
Kaihua Wu
Journal of the American Society for Information Science and Technology, 2011, vol. 62, issue 6, 1029-1041
Abstract:
With increasing digital information availability, semantic web technologies have been employed to construct semantic digital libraries in order to ease information comprehension. The use of semantic web enables users to search or visualize resources in a semantic fashion. Semantic web generation is a key process in semantic digital library construction, which converts metadata of digital resources into semantic web data. Many text mining technologies, such as keyword extraction and clustering, have been proposed to generate semantic web data. However, one important type of metadata in publications, called affiliation, is hard to convert into semantic web data precisely because different authors, who have the same affiliation, often express the affiliation in different ways. To address this issue, this paper proposes a clustering method based on normalized compression distance for the purpose of affiliation disambiguation. The experimental results show that our method is able to identify different affiliations that denote the same institutes. The clustering results outperform the well‐known k‐means clustering method in terms of average precision, F‐measure, entropy, and purity.
Date: 2011
References: Add references at CitEc
Citations: View citations in EconPapers (7)
Downloads: (external link)
https://doi.org/10.1002/asi.21538
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bla:jamist:v:62:y:2011:i:6:p:1029-1041
Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1532-2890
Access Statistics for this article
More articles in Journal of the American Society for Information Science and Technology from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().