CFMf topic-model: comparison with LDA and Top2Vec
Jean-Charles Lamirel (),
Francis Lareau () and
Christophe Malaterre ()
Additional contact information
Jean-Charles Lamirel: Université de Strasbourg
Francis Lareau: Université du Québec à Montréal
Christophe Malaterre: Université du Québec à Montréal
Scientometrics, 2024, vol. 129, issue 10, No 28, 6387-6405
Abstract:
Abstract Mining the content of scientific publications is increasingly used to investigate the practice of science and the evolution of research domains. Topic models, among which LDA (statistical bag-of-words approach) and Top2Vec (embeddings approach), have notably been shown to provide rich insights into the thematic content of disciplinary fields, their structure and evolution through time. However, improving topic modeling methods remains a major concern. Here we propose an alternative topic-modeling approach based on neural clustering and feature maximization with F1-measure (in short: CFMf). We compare the performance of this approach to LDA and Top2Vec by applying the methods to a reference corpus of full-text philosophy of science articles (N = 16,917). The results reveal significant improvements in terms of coherence measures, independently of the number of topics. Qualitative comparisons show an overall consistency in terms of topical coverage across all three methods, yet with differences: in particular, CFMf appears affected by the presence of a large class while Top2Vec generates some sets of top-words highly difficult to interpret. We discuss these results and highlight upcoming research work.
Keywords: Topic model; CFMf; LDA; Top2Vec; Clustering; Topic coherence; Topic diversity (search for similar items in EconPapers)
Date: 2024
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s11192-024-05017-z Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:scient:v:129:y:2024:i:10:d:10.1007_s11192-024-05017-z
Ordering information: This journal article can be ordered from
http://www.springer.com/economics/journal/11192
DOI: 10.1007/s11192-024-05017-z
Access Statistics for this article
Scientometrics is currently edited by Wolfgang Glänzel
More articles in Scientometrics from Springer, Akadémiai Kiadó
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().