Comparing representations of a discipline derived through LDA vs. intellectual content analysis: the case of information science
Kaisa Ylikruuvi,
Kalervo Järvelin (),
Pertti Vakkari and
Martti Juhola
Additional contact information
Kaisa Ylikruuvi: Tampere University
Kalervo Järvelin: Tampere University
Pertti Vakkari: Tampere University
Martti Juhola: Tampere University
Scientometrics, 2025, vol. 130, issue 8, No 6, 4309-4337
Abstract:
Abstract The paper looks at the methodology of empirical analyses of the content and structure of Information Science (IS). The traditional approach in empirical analysis is intellectual content analysis (ICA) of a representative data set. The high labor cost prohibits the analysis of massive data sets. A recent alternative is based on data mining/machine learning. Its strength is the capability of analyzing massive datasets efficiently. However, a significant issue is the quality of content analysis. The paper compares latent Dirichlet allocation/topic modeling (LDA/TM) based statistical analysis to ICA using the same data set, 1514 scholarly articles from the year 2015 volumes of 30 IS journals. The intellectual analysis provides the mirror for reflecting the TM results. LDA/TM is strong in identifying new directions of a discipline and processing masses of text. Its weaknesses include semantic haziness of topics due to bag-of-words article representation, text pre-processing, tuning of parameters, and being unanalytic in composing topics from words belonging to different categories.
Keywords: Information science; Content analysis; Latent Dirichlet allocation; Comparative study (search for similar items in EconPapers)
Date: 2025
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s11192-025-05376-1 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:scient:v:130:y:2025:i:8:d:10.1007_s11192-025-05376-1
Ordering information: This journal article can be ordered from
http://www.springer.com/economics/journal/11192
DOI: 10.1007/s11192-025-05376-1
Access Statistics for this article
Scientometrics is currently edited by Wolfgang Glänzel
More articles in Scientometrics from Springer, Akadémiai Kiadó
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().