Improving fair name-based prediction of gender in scientific communities
Maria Guariglia Migliore (),
Gregorio D’Agostino (),
Tatiana Patriarca () and
Antonio De Nicola ()
Additional contact information
Maria Guariglia Migliore: Guglielmo Marconi University
Gregorio D’Agostino: Italian National Agency for New Technologies, Energy and Sustainable Economic Development (ENEA), Casaccia Research Centre
Tatiana Patriarca: Italian National Agency for New Technologies, Energy and Sustainable Economic Development (ENEA), Casaccia Research Centre
Antonio De Nicola: Italian National Agency for New Technologies, Energy and Sustainable Economic Development (ENEA), Casaccia Research Centre
Scientometrics, 2025, vol. 130, issue 9, No 4, 4849-4877
Abstract:
Abstract The role of women in modern society is a central problem in several developed countries. Despite encouraging policies, women’s participation in STEM fields is significantly lower than men’s one. In order to develop solutions for mitigating this disparity, a deeper understanding of the underlying causes is crucial and a proper quantification of the phenomenon represents a first step to any analysis. While the problem of gender gap in scientific communities was long debated, information on authors’ genders is often unavailable (see, for instance, ResearchGate and Scopus). Additionally, the lack of open-source software for automated gender prediction based on names calls for time costly human efforts. It arises the need for novel effective algorithms. Moreover, as a further challenge, desired software should guarantee gender fairness by providing the same performance for both male and female names recognition. In this paper, we propose a gender fair software to automatically predict authors’ gender from their given names. The code leverages most of the existing information sources, i.e., Scopus, Semantic Scholar, and Harvard dataset. We performed an experimental application by analysing two datasets of publications, thus providing interesting insights. Finally, we evaluated the software performances in terms of accuracy, precision, recall, $$\text {F1-score}$$ F1-score , and gender fairness by means of two distinct case studies. The proposed solution can enable fairer gender prediction by combining open data with carefully calibrated criteria, matching the performance of commercial tools while offering a transparent and accessible solution.
Keywords: Bibliometric analysis; Prediction; Gender; Gender fairness (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s11192-025-05384-1 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:scient:v:130:y:2025:i:9:d:10.1007_s11192-025-05384-1
Ordering information: This journal article can be ordered from
http://www.springer.com/economics/journal/11192
DOI: 10.1007/s11192-025-05384-1
Access Statistics for this article
Scientometrics is currently edited by Wolfgang Glänzel
More articles in Scientometrics from Springer, Akadémiai Kiadó
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().