Disambiguation by namesake risk assessment
Thorsten Doherr
No 21-021, ZEW Discussion Papers from ZEW - Leibniz Centre for European Economic Research
Abstract:
Most bibliometric databases only provide names as the handle to their careers leading to the issue of namesakes. We introduce a universal method to assess the risk of linking documents of different individuals sharing the same name with the goal of collecting the documents into personalized clusters. A theoretical setup for the probability of drawing a namesake depending on the number of namesakes in the population and the size of the observed unit replaces the need for training datasets, thereby avoiding a namesake bias caused by the inherent underestimation of namesakes in training/benchmark data. A Poisson model based on a master sample of unambiguously identified individuals estimates the main component, the number of namesakes for any given name. To implement the algorithm, we reduce the complexity in the data by resolving similarity in properties. At the core of the implementation is a mechanism returning the unit size of the intersected mutual properties linking two documents. Because of the high computational demands of this mechanism, it is a necessity to discuss means to optimize the procedure.
Keywords: homonymy; namesakes; disambiguation; scientific careers; inventors; patents; publications (search for similar items in EconPapers)
JEL-codes: C18 C36 (search for similar items in EconPapers)
Date: 2021
New Economics Papers: this item is included in nep-cmp and nep-ore
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (4)
Downloads: (external link)
https://www.econstor.eu/bitstream/10419/231411/1/1750558505.pdf (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:zbw:zewdip:21021
Access Statistics for this paper
More papers in ZEW Discussion Papers from ZEW - Leibniz Centre for European Economic Research Contact information at EDIRC.
Bibliographic data for series maintained by ZBW - Leibniz Information Centre for Economics ().