Use of ResearchGate and Google CSE for author name disambiguation
Mehmet Ali Abdulhayoglu () and
Bart Thijs ()
Additional contact information
Mehmet Ali Abdulhayoglu: KU Leuven
Bart Thijs: KU Leuven
Scientometrics, 2017, vol. 111, issue 3, No 42, 1965-1985
Abstract:
Abstract Author name disambiguation plays a very important role in individual based bibliometric analysis and has suffered from lack of information. Therefore, some have tried to leverage external web sources to obtain additional evidence with success. However, the main problem is generally the high cost of extracting data from web pages due to their diverse designs. Considering this challenge, we employed ResearchGate (RG), a social network platform for scholars presenting their publication lists in a structured way. Even though the platform might be imperfect, it can be valuable when it is used along with traditional approaches for the purpose of confirmation. To this end, in our first (retrieval) stage we applied a graph based machine learning approach, connected components (CC) and formed clusters. Then, the data crawled from RG for the same authors were combined with the CC results in stage 2. We observed that 76.40% of the clusters formed by CC were confirmed by the RG data and they accounted for 68.33% of all citations. Second, a subset was drawn from the dataset by retaining those clusters having at least 10 members to examine the details. This time we additionally employed the Google Custom Search Engine (CSE) API to access authors’ web pages as a complementary tool to RG. We observed an F score of 0.95 when CC results were confirmed by RG&CSE. Almost the same success was observed when only the CC approach was applied. In addition, we observed that the publications identified and confirmed through the external sources were cited to a greater extent than those publications not found in the related external sources. Even though promising, there are still issues with the use of external sources. We have seen that many authors present only a few selected papers on the web. This hampers our procedure, making it unable to obtain the entire publication list. Missing publications affect bibliometric analysis adversely since all citation data is required. That is, if only the data confirmed via external sources is used, bibliometric indicators will be overestimated. On the other hand, our suggested methodology can potentially decrease the manual work required for individual based bibliometric analysis. The procedure may also present more reliable results by confirming cluster members derived from unsupervised grouping methods. This approach might be especially beneficial for large datasets where extensive manual work would otherwise be required.
Keywords: Author name disambiguation; Researchgate; Google CSE; Information retrieval (search for similar items in EconPapers)
Date: 2017
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (4)
Downloads: (external link)
http://link.springer.com/10.1007/s11192-017-2341-y Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:scient:v:111:y:2017:i:3:d:10.1007_s11192-017-2341-y
Ordering information: This journal article can be ordered from
http://www.springer.com/economics/journal/11192
DOI: 10.1007/s11192-017-2341-y
Access Statistics for this article
Scientometrics is currently edited by Wolfgang Glänzel
More articles in Scientometrics from Springer, Akadémiai Kiadó
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().