EconPapers    
Economics at your fingertips  
 

Efficient supervised and semi-supervised approaches for affiliations disambiguation

Pascal Cuxac (), Jean-Charles Lamirel () and Valerie Bonvallot ()
Additional contact information
Pascal Cuxac: INIST-CNRS
Jean-Charles Lamirel: LORIA-Synalp
Valerie Bonvallot: INIST-CNRS

Scientometrics, 2013, vol. 97, issue 1, No 6, 47-58

Abstract: Abstract The disambiguation of named entities is a challenge in many fields such as scientometrics, social networks, record linkage, citation analysis, semantic web…etc. The names ambiguities can arise from misspelling, typographical or OCR mistakes, abbreviations, omissions… Therefore, the search of names of persons or of organizations is difficult as soon as a single name might appear in many different forms. This paper proposes two approaches to disambiguate on the affiliations of authors of scientific papers in bibliographic databases: the first way considers that a training dataset is available, and uses a Naive Bayes model. The second way assumes that there is no learning resource, and uses a semi-supervised approach, mixing soft-clustering and Bayesian learning. The results are encouraging and the approach is already partially applied in a scientific survey department. However, our experiments also highlight that our approach has some limitations: it cannot process efficiently highly unbalanced data. Alternatives solutions are possible for future developments, particularly with the use of a recent clustering algorithm relying on feature maximization.

Keywords: Affiliation; Disambiguation; Data cleaning; Classification; Clustering; Semi-supervised; Bibliographic databases; K-means; Naive bayes (search for similar items in EconPapers)
Date: 2013
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (8)

Downloads: (external link)
http://link.springer.com/10.1007/s11192-013-1025-5 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:scient:v:97:y:2013:i:1:d:10.1007_s11192-013-1025-5

Ordering information: This journal article can be ordered from
http://www.springer.com/economics/journal/11192

DOI: 10.1007/s11192-013-1025-5

Access Statistics for this article

Scientometrics is currently edited by Wolfgang Glänzel

More articles in Scientometrics from Springer, Akadémiai Kiadó
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-20
Handle: RePEc:spr:scient:v:97:y:2013:i:1:d:10.1007_s11192-013-1025-5