Text data mining: a proposed framework and future perspectives
Sana'a A. Alwidian,
Hani A. Bani-Salameh and
Ala'a N. Alslaity
International Journal of Business Information Systems, 2015, vol. 18, issue 2, 127-140
Abstract:
With the increased advancements in technology and the emergence of different kinds of applications, the amount of available data becomes enormous, and the large proliferation of such data becomes evident. Therefore, there is an essential need for some techniques or methods to interact with data and extract useful information and patterns from them. Text data mining (TDM) is the process of extracting desired information out of mountains of textual data that are inherently unstructured, without the need to read them all. In this paper, we shed the light on the-state-of-the-art in text mining as an interdisciplinary field of several related areas. To facilitate the understanding of text data mining, this paper proposes a framework that visualises this field in a step-wise manner, taking into consideration the semantic of the extracted text. In addition, this paper surveys a number of useful applications and proposes a new approach for spam detection based on the proposed TDM framework.
Keywords: text mining; clustering; categorisation; spam filtering; semantics; information retrieval; text data mining; TDM; natural language processing; NLP; knowledge discovery from databases; KDD; knowledge discovery from text; KDT; semantic analysis. (search for similar items in EconPapers)
Date: 2015
References: Add references at CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
http://www.inderscience.com/link.php?id=67261 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:ids:ijbisy:v:18:y:2015:i:2:p:127-140
Access Statistics for this article
More articles in International Journal of Business Information Systems from Inderscience Enterprises Ltd
Bibliographic data for series maintained by Sarah Parker ().