EconPapers    
Economics at your fingertips  
 

GepH: Entity Predictor for Hindi News

Prafulla B. Bafna ()
Additional contact information
Prafulla B. Bafna: Symbiosis Institute of Computer Studies and Research, Symbiosis International (Deemed) University, Pune, India

Journal of Information & Knowledge Management (JIKM), 2023, vol. 22, issue 04, 1-14

Abstract: In this era, news is not only generated continuously with high speed but also growing in its amount by different web sources like talent hunt, news agencies, and so on. To predict the exact class of news depending on its topic, GepH (Grouped entity predictor for Hindi) is proposed using entity extraction and grouping. Entity extraction is popular for English corpus. Hindi is a national language due to its resource scarceness not being explored so much by researchers. More than 1,270 news are processed to apply entity extraction, clustering, and classification using the vector space model for Hindi (VSMH), Synset vector space model for Hindi (SVSMH), and grouped entity document matrix for Hindi (GEDMH). Synset-based dimension reduction techniques are used to get improved accuracy. Evaluation of HAC using three matrices shows the best performance of GEDMH for varied datasets. Thus labelled corpus obtained after applying HAC (Hierarchical agglomerative clustering) to GEDMH is used as a training dataset and predictions are done using random forest and Naïve Bayes. The Naïve Bayes classifier implemented using the proposed GEDMH performs the best. GepH shows 0.8 purity, 0.4 entropy, and 0.3 as error rate for 1,273 Hindi news.

Keywords: Conditional random field; dendrogram; entropy; named entity recognition; synset; Hindi (search for similar items in EconPapers)
Date: 2023
References: Add references at CitEc
Citations:

Downloads: (external link)
http://www.worldscientific.com/doi/abs/10.1142/S0219649223500168
Access to full text is restricted to subscribers

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:wsi:jikmxx:v:22:y:2023:i:04:n:s0219649223500168

Ordering information: This journal article can be ordered from

DOI: 10.1142/S0219649223500168

Access Statistics for this article

Journal of Information & Knowledge Management (JIKM) is currently edited by Professor Suliman Hawamdeh

More articles in Journal of Information & Knowledge Management (JIKM) from World Scientific Publishing Co. Pte. Ltd.
Bibliographic data for series maintained by Tai Tone Lim ().

 
Page updated 2025-03-20
Handle: RePEc:wsi:jikmxx:v:22:y:2023:i:04:n:s0219649223500168