Efficiency Considerations for VerticalkNN Text Categorisation
Imad Rahal (),
Hassan Najadat () and
William Perrizo ()
Additional contact information
Imad Rahal: 211, Peter Engel Science Center, Computer Science Department, College of St. Benedict and St. John's University, Collegeville, MN 56321, USA
Hassan Najadat: Computer Information Systems Department, Jordan University of Science and Technology, P.O. Box 3030 Irbid, 22110, Jordan
William Perrizo: IACC 258 A15, Computer Science Department, North Dakota State University, Fargo, ND 58105, USA
Journal of Information & Knowledge Management (JIKM), 2006, vol. 05, issue 03, 211-222
Abstract:
The importance of text mining stems from the availability of huge volumes of text databases holding a wealth of valuable information that needs to be mined. Text mining is a coarse area encompassing many finer branches one of which is text categorisation or text classification. Text categorisation is the process of assigning class labels to documents based entirely on their textual contents where we are given a documentd, and asked to find its subject matter or class label,Ci.In this paper, an optimisedk-Nearest Neighbours classifier that uses discretisation, the P-tree technology, and dimensionality reduction to achieve a high degree of accuracy, space utilisation and time efficiency is proposed. One of the fundamental contributions of this work is that as new samples arrive, the proposed classifier can find theknearest neighbours to the new sample from the training space without a single database scan.
Keywords: Text categorisation; text classification; text information management; P-trees; data mining (search for similar items in EconPapers)
Date: 2006
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
http://www.worldscientific.com/doi/abs/10.1142/S021964920600144X
Access to full text is restricted to subscribers
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:wsi:jikmxx:v:05:y:2006:i:03:n:s021964920600144x
Ordering information: This journal article can be ordered from
DOI: 10.1142/S021964920600144X
Access Statistics for this article
Journal of Information & Knowledge Management (JIKM) is currently edited by Professor Suliman Hawamdeh
More articles in Journal of Information & Knowledge Management (JIKM) from World Scientific Publishing Co. Pte. Ltd.
Bibliographic data for series maintained by Tai Tone Lim ().