Assessing the Performance of Compression Based Clustering for Text Mining
Alexandra Cernian (),
Dorin Carstoiu (),
Adriana Olteanu () and
Valentin Sgarciu ()
Additional contact information
Alexandra Cernian: “Politehnica” University of Bucharest
Dorin Carstoiu: “Politehnica” University of Bucharest
Adriana Olteanu: “Politehnica” University of Bucharest
Valentin Sgarciu: “Politehnica” University of Bucharest
ECONOMIC COMPUTATION AND ECONOMIC CYBERNETICS STUDIES AND RESEARCH, 2016, vol. 50, issue 2, 197-210
Abstract:
The nature of the human brain is to find patterns in whatever surrounds us. Thus, we are all developing models of our personal universe. In an extended form, a constant preoccupation of philosophers has been to model the universe. Clustering is one of the most useful tools in the data mining process for discovering groups and identifying patterns in the underlying data. This paper addresses the compression based clustering approach and focuses on validating this method in the context of text mining. The idea is supported by the evidence that compression algorithms provide a good evaluation of the informational content. In this context, we developed an integrated clustering platform, called EasyClustering, which incorporates 3 compressors, 4 distance metrics and 3 clustering algorithms. The experimental validation presented in this paper focuses on clustering text documents based on informational content.
Keywords: clustering; compression; text mining; EasyClustering; FScore. (search for similar items in EconPapers)
JEL-codes: O30 (search for similar items in EconPapers)
Date: 2016
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
ftp://www.eadr.ro/RePEc/cys/ecocyb_pdf/ecocyb2_2016p197-210.pdf
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:cys:ecocyb:v:50:y:2016:i:2:p:197-210
Access Statistics for this article
ECONOMIC COMPUTATION AND ECONOMIC CYBERNETICS STUDIES AND RESEARCH is currently edited by Gheorghe RUXANDA
More articles in ECONOMIC COMPUTATION AND ECONOMIC CYBERNETICS STUDIES AND RESEARCH from Faculty of Economic Cybernetics, Statistics and Informatics Contact information at EDIRC.
Bibliographic data for series maintained by Corina Saman ().