EconPapers    
Economics at your fingertips  
 

Text mining by Tsallis entropy

Maryam Jamaati and Ali Mehri

Physica A: Statistical Mechanics and its Applications, 2018, vol. 490, issue C, 1368-1376

Abstract: Long-range correlations between the elements of natural languages enable them to convey very complex information. Complex structure of human language, as a manifestation of natural languages, motivates us to apply nonextensive statistical mechanics in text mining. Tsallis entropy appropriately ranks the terms’ relevance to document subject, taking advantage of their spatial correlation length. We apply this statistical concept as a new powerful word ranking metric in order to extract keywords of a single document. We carry out an experimental evaluation, which shows capability of the presented method in keyword extraction. We find that, Tsallis entropy has reliable word ranking performance, at the same level of the best previous ranking methods.

Keywords: Text mining; Keyword extraction; Tsallis entropy; Long-range correlation (search for similar items in EconPapers)
Date: 2018
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (4)

Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0378437117309123
Full text for ScienceDirect subscribers only. Journal offers the option of making the article available online on Science direct for a fee of $3,000

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:eee:phsmap:v:490:y:2018:i:c:p:1368-1376

DOI: 10.1016/j.physa.2017.09.020

Access Statistics for this article

Physica A: Statistical Mechanics and its Applications is currently edited by K. A. Dawson, J. O. Indekeu, H.E. Stanley and C. Tsallis

More articles in Physica A: Statistical Mechanics and its Applications from Elsevier
Bibliographic data for series maintained by Catherine Liu ().

 
Page updated 2025-03-19
Handle: RePEc:eee:phsmap:v:490:y:2018:i:c:p:1368-1376