EconPapers    
Economics at your fingertips  
 

Topic classification of economic newspaper articles in a highly inflectional language – the case of Serbia

Mirko Djukic
Additional contact information
Mirko Djukic: National Bank of Serbia

Working Papers Bulletin from National Bank of Serbia

Abstract: The frequency of certain topics in newspaper articles can be a good indicator of some economic developments. The application of topic modelling in the Serbian language, using the LDA model, is hampered by the fact that Serbian is a highly inflectional language, where words have a large number of forms which the model recognises as words with a different meaning. In this paper, we tried to turn that aggravating circumstance into an advantage by reducing only the economic words to their base form. Thus, we attributed to them a greater relevance than to non-economic words, which remained in a large number of forms with a lower frequency of occurrence. As the topics classified in this manner were mostly based on economic expressions, it was expected that they would have a greater applicability in further economic analyses.

Keywords: textual analysis; topic modelling; Latent Dirichlet Allocation; LASSO model (search for similar items in EconPapers)
JEL-codes: C13 C55 E31 E37 E52 (search for similar items in EconPapers)
Pages: 23 pages
Date: 2024-03
New Economics Papers: this item is included in nep-big, nep-his and nep-tra
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.nbs.rs/documents-eng/publikacije/wp_bulletin/wp_bulletin_03_24_2.pdf Full text (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nsb:bilten:21

Access Statistics for this paper

More papers in Working Papers Bulletin from National Bank of Serbia Contact information at EDIRC.
Bibliographic data for series maintained by Marko Miseljic ().

 
Page updated 2025-03-31
Handle: RePEc:nsb:bilten:21