Topic classification of economic newspaper articles in a highly inflectional language – the case of Serbia
Mirko Djukic
Additional contact information
Mirko Djukic: National Bank of Serbia
Working Papers Bulletin from National Bank of Serbia
Abstract:
The frequency of certain topics in newspaper articles can be a good indicator of some economic developments. The application of topic modelling in the Serbian language, using the LDA model, is hampered by the fact that Serbian is a highly inflectional language, where words have a large number of forms which the model recognises as words with a different meaning. In this paper, we tried to turn that aggravating circumstance into an advantage by reducing only the economic words to their base form. Thus, we attributed to them a greater relevance than to non-economic words, which remained in a large number of forms with a lower frequency of occurrence. As the topics classified in this manner were mostly based on economic expressions, it was expected that they would have a greater applicability in further economic analyses.
Keywords: textual analysis; topic modelling; Latent Dirichlet Allocation; LASSO model (search for similar items in EconPapers)
JEL-codes: C13 C55 E31 E37 E52 (search for similar items in EconPapers)
Pages: 23 pages
Date: 2024-03
New Economics Papers: this item is included in nep-big, nep-his and nep-tra
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.nbs.rs/documents-eng/publikacije/wp_bulletin/wp_bulletin_03_24_2.pdf Full text (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:nsb:bilten:21
Access Statistics for this paper
More papers in Working Papers Bulletin from National Bank of Serbia Contact information at EDIRC.
Bibliographic data for series maintained by Marko Miseljic ().