A review of recent advances in text mining of Indian languages

Panigrahi, Prabin Kumar; Bele, Nishikant

A review of recent advances in text mining of Indian languages

Prabin Kumar Panigrahi and Nishikant Bele

International Journal of Business Information Systems, 2016, vol. 23, issue 2, 175-193

Abstract: Text mining in English language has been researched extensively in past and significant amount of resources, tools and techniques have been developed. India is a country of high language diversity. A large amount of textual data is available in Indian languages. Knowledge can be discovered from this text by applying text-mining techniques. Due to the characteristics of Indian languages, tools, techniques and resources available for mining text in English language cannot be applied directly to text in Indian languages. We could not find any comprehensive literature describing the research work related to mining of text written in Indian languages. In this paper, we review the research work done so far, availability of language resources and various challenges of text mining tasks in Indian languages.

Keywords: text mining; Indian languages; language corpora; feature extraction; language resources; classification; sentiment analysis; natural language processing; NLP; Hindi; India; Indian texts. (search for similar items in EconPapers)
Date: 2016
References: Add references at CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
http://www.inderscience.com/link.php?id=78905 (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:ids:ijbisy:v:23:y:2016:i:2:p:175-193

Access Statistics for this article

More articles in International Journal of Business Information Systems from Inderscience Enterprises Ltd
Bibliographic data for series maintained by Sarah Parker ().