Context-Based Bigram Model for POS Tagging in Hindi: A Heuristic Approach

Bharti, Santosh Kumar; Gupta, Rajeev Kumar; Patel, Samir; Shah, Manan

Context-Based Bigram Model for POS Tagging in Hindi: A Heuristic Approach

Santosh Kumar Bharti, Rajeev Kumar Gupta, Samir Patel and Manan Shah ()
Additional contact information
Santosh Kumar Bharti: Pandit Deendayal Energy University
Rajeev Kumar Gupta: Pandit Deendayal Energy University
Samir Patel: Pandit Deendayal Energy University
Manan Shah: Pandit Deendayal Energy University

Annals of Data Science, 2024, vol. 11, issue 1, No 15, 347-378

Abstract: Abstract In the domain of natural language processing, part-of-speech (POS) tagging is the most important task. It plays a vital role in applications like sentiment analysis, text summarization, opinion mining, etc. POS tagging is a process of assigning POS information (noun, pronoun, verb, etc.) to the given word. This information is considered in the context of their relationship with the surrounding words. Hindi is very popular language in countries like India, Nepal, United States, Mauritius, etc. Majority of Indians are accustomed to Hindi for reading and writing. They also use Hindi for writing on social media such as Twitter, Facebook, WhatsApp, etc. POS tagging is the most important phase to analyze these Hindi text from social media. The text scripted in Hindi is ambiguous in nature and rich in morphology. It makes identification of POS information challenging. In this article, a heuristic based approach is proposed for identifying POS information. The proposed method deployed a context-based bigram model that create a bigram sequence based on the relationship with the adjacent words. Subsequently, it selects the most likelihood POS information for a word based on both the forward and reverse bigram sequences. The experimental result of the proposed heuristic approach is compared with existing state-of-the-art techniques like hidden Markov model, decision tree, conditional random fields, support vector machine, neural network, and recurrent neural networks. Finally, it is observe that the proposed heuristic approach for POS tagging in Hindi outperforms the existing techniques and attains an accuracy of 94.3%.

Keywords: Machine learning; Deep learning; Hidden Markov model; Bigram; Context; Greedy approach; Hindi; Natural language processing; POS tagging (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s40745-022-00434-4 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:aodasc:v:11:y:2024:i:1:d:10.1007_s40745-022-00434-4

Ordering information: This journal article can be ordered from
https://www.springer ... gement/journal/40745

DOI: 10.1007/s40745-022-00434-4

Access Statistics for this article

Annals of Data Science is currently edited by Yong Shi

More articles in Annals of Data Science from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().