EconPapers    
Economics at your fingertips  
 

Efficiency Considerations for VerticalkNN Text Categorisation

Imad Rahal (), Hassan Najadat () and William Perrizo ()
Additional contact information
Imad Rahal: 211, Peter Engel Science Center, Computer Science Department, College of St. Benedict and St. John's University, Collegeville, MN 56321, USA
Hassan Najadat: Computer Information Systems Department, Jordan University of Science and Technology, P.O. Box 3030 Irbid, 22110, Jordan
William Perrizo: IACC 258 A15, Computer Science Department, North Dakota State University, Fargo, ND 58105, USA

Journal of Information & Knowledge Management (JIKM), 2006, vol. 05, issue 03, 211-222

Abstract: The importance of text mining stems from the availability of huge volumes of text databases holding a wealth of valuable information that needs to be mined. Text mining is a coarse area encompassing many finer branches one of which is text categorisation or text classification. Text categorisation is the process of assigning class labels to documents based entirely on their textual contents where we are given a documentd, and asked to find its subject matter or class label,Ci.In this paper, an optimisedk-Nearest Neighbours classifier that uses discretisation, the P-tree technology, and dimensionality reduction to achieve a high degree of accuracy, space utilisation and time efficiency is proposed. One of the fundamental contributions of this work is that as new samples arrive, the proposed classifier can find theknearest neighbours to the new sample from the training space without a single database scan.

Keywords: Text categorisation; text classification; text information management; P-trees; data mining (search for similar items in EconPapers)
Date: 2006
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
http://www.worldscientific.com/doi/abs/10.1142/S021964920600144X
Access to full text is restricted to subscribers

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:wsi:jikmxx:v:05:y:2006:i:03:n:s021964920600144x

Ordering information: This journal article can be ordered from

DOI: 10.1142/S021964920600144X

Access Statistics for this article

Journal of Information & Knowledge Management (JIKM) is currently edited by Professor Suliman Hawamdeh

More articles in Journal of Information & Knowledge Management (JIKM) from World Scientific Publishing Co. Pte. Ltd.
Bibliographic data for series maintained by Tai Tone Lim ().

 
Page updated 2025-03-20
Handle: RePEc:wsi:jikmxx:v:05:y:2006:i:03:n:s021964920600144x