A Comparison of Algorithms for Text Classification of Albanian News Articles
Arbana Kadriu and
Lejla Abazi
A chapter in Proceedings of the ENTRENOVA - ENTerprise REsearch InNOVAtion Conference, Dubrovnik, Croatia, 7-9 September 2017, 2017, pp 1-7 from IRENET - Society for Advancing Innovation and Research in Economy, Zagreb
Abstract:
Text classification is an essential work in text mining and information retrieval. There are a lot of algorithms developed aiming to classify computational data and most of them are extended to classify textual data. We have used some of these algorithms to train the classifiers with part of our crawled Albanian news articles and classify the other part with the already learned classifiers. The used categories are: latest news, economy, sport, showbiz, technology, culture, and world. First, we remove all stop words from the gained articles and the output of this step is a separate text file for each category. All these files are then split in sentences, and for each sentence the appropriate category is assigned. All these sentences are then projected to a single list of tuples sentence/category. This list is used to train (80% of the overall number) and to test (the remained 20%) different classifiers. This list is at the end shuffled aiming to randomize the sequence of different categories. We have trained and then test our articles measuring the accuracy for each classifier separately. We have also analysed the training and testing time.
Keywords: data mining; text classification; news articles; machine learning (search for similar items in EconPapers)
JEL-codes: C00 C30 (search for similar items in EconPapers)
Date: 2017
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.econstor.eu/bitstream/10419/183756/1/0 ... Kadriu-paper-1-7.pdf (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:zbw:entr17:183756
Access Statistics for this chapter
More chapters in Proceedings of the ENTRENOVA - ENTerprise REsearch InNOVAtion Conference (2017), Dubrovnik, Croatia from IRENET - Society for Advancing Innovation and Research in Economy, Zagreb
Bibliographic data for series maintained by ZBW - Leibniz Information Centre for Economics ().