GepH: Entity Predictor for Hindi News
Prafulla B. Bafna ()
Additional contact information
Prafulla B. Bafna: Symbiosis Institute of Computer Studies and Research, Symbiosis International (Deemed) University, Pune, India
Journal of Information & Knowledge Management (JIKM), 2023, vol. 22, issue 04, 1-14
Abstract:
In this era, news is not only generated continuously with high speed but also growing in its amount by different web sources like talent hunt, news agencies, and so on. To predict the exact class of news depending on its topic, GepH (Grouped entity predictor for Hindi) is proposed using entity extraction and grouping. Entity extraction is popular for English corpus. Hindi is a national language due to its resource scarceness not being explored so much by researchers. More than 1,270 news are processed to apply entity extraction, clustering, and classification using the vector space model for Hindi (VSMH), Synset vector space model for Hindi (SVSMH), and grouped entity document matrix for Hindi (GEDMH). Synset-based dimension reduction techniques are used to get improved accuracy. Evaluation of HAC using three matrices shows the best performance of GEDMH for varied datasets. Thus labelled corpus obtained after applying HAC (Hierarchical agglomerative clustering) to GEDMH is used as a training dataset and predictions are done using random forest and Naïve Bayes. The Naïve Bayes classifier implemented using the proposed GEDMH performs the best. GepH shows 0.8 purity, 0.4 entropy, and 0.3 as error rate for 1,273 Hindi news.
Keywords: Conditional random field; dendrogram; entropy; named entity recognition; synset; Hindi (search for similar items in EconPapers)
Date: 2023
References: Add references at CitEc
Citations:
Downloads: (external link)
http://www.worldscientific.com/doi/abs/10.1142/S0219649223500168
Access to full text is restricted to subscribers
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:wsi:jikmxx:v:22:y:2023:i:04:n:s0219649223500168
Ordering information: This journal article can be ordered from
DOI: 10.1142/S0219649223500168
Access Statistics for this article
Journal of Information & Knowledge Management (JIKM) is currently edited by Professor Suliman Hawamdeh
More articles in Journal of Information & Knowledge Management (JIKM) from World Scientific Publishing Co. Pte. Ltd.
Bibliographic data for series maintained by Tai Tone Lim ().