Robust hybrid name disambiguation framework for large databases

Zhu, Jia; Yang, Yi; Xie, Qing; Wang, Liwei; Hassan, Saeed-Ul

Robust hybrid name disambiguation framework for large databases

Jia Zhu (), Yi Yang, Qing Xie, Liwei Wang and Saeed-Ul Hassan
Additional contact information
Jia Zhu: South China Normal University
Yi Yang: Carnegie Mellon University
Qing Xie: King Abdullah University of Science and Technology
Liwei Wang: Wuhan University
Saeed-Ul Hassan: COMSATS Institute of Information Technology

Scientometrics, 2014, vol. 98, issue 3, No 42, 2255-2274

Abstract: Abstract In many databases, science bibliography database for example, name attribute is the most commonly chosen identifier to identify entities. However, names are often ambiguous and not always unique which cause problems in many fields. Name disambiguation is a non-trivial task in data management that aims to properly distinguish different entities which share the same name, particularly for large databases like digital libraries, as only limited information can be used to identify authors’ name. In digital libraries, ambiguous author names occur due to the existence of multiple authors with the same name or different name variations for the same person. Also known as name disambiguation, most of the previous works to solve this issue often employ hierarchical clustering approaches based on information inside the citation records, e.g. co-authors and publication titles. In this paper, we focus on proposing a robust hybrid name disambiguation framework that is not only applicable for digital libraries but also can be easily extended to other application based on different data sources. We propose a web pages genre identification component to identify the genre of a web page, e.g. whether the page is a personal homepage. In addition, we propose a re-clustering model based on multidimensional scaling that can further improve the performance of name disambiguation. We evaluated our approach on known corpora, and the favorable experiment results indicated that our proposed framework is feasible.

Keywords: Name disambiguation; Multidimensional scaling; Genre identification; Clustering (search for similar items in EconPapers)
Date: 2014
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (4)

Downloads: (external link)
http://link.springer.com/10.1007/s11192-013-1151-0 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:scient:v:98:y:2014:i:3:d:10.1007_s11192-013-1151-0

Ordering information: This journal article can be ordered from
http://www.springer.com/economics/journal/11192

DOI: 10.1007/s11192-013-1151-0

Access Statistics for this article

Scientometrics is currently edited by Wolfgang Glänzel

More articles in Scientometrics from Springer, Akadémiai Kiadó
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().