How Many Keywords are Enough? Determining the Optimal Top-K for Educational Website Classification
Mohd Nazrien Zaraini and
Noorrezam Yusop
Additional contact information
Mohd Nazrien Zaraini: Fakulti Teknologi Maklumat Dan Komunikasi, Universiti Teknikal Malaysia Melaka
Noorrezam Yusop: Fakulti Teknologi Maklumat Dan Komunikasi, Universiti Teknikal Malaysia Melaka
International Journal of Research and Innovation in Social Science, 2025, vol. 9, issue 6, 1574-1586
Abstract:
The classification of educational websites has become increasingly challenging, as traditional indicators such as domain extensions no longer reliably reflect a site’s purpose. This study investigates the optimal number of TF-IDF-ranked keywords (K) required to balance classification accuracy and computational efficiency in a one-class setting. Using a curated dataset sourced from the DMOZ directory and verified educational websites, multiple Top-K keyword subsets (K = 10–200) were evaluated. A One-Class Support Vector Machine (SVM) was employed, with performance assessed through cross-validation and separate positive/negative test sets. Results indicate that classification accuracy peaks within the range of K = 30–100, with diminished performance beyond this range due to the inclusion of irrelevant or noisy terms. These findings offer a practical and scalable framework for content-based educational website classification, particularly for applications in low-resource environments, and challenge the default reliance on exhaustive keyword feature sets.
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://www.rsisinternational.org/journals/ijriss/ ... ssue-6/1574-1586.pdf (application/pdf)
https://rsisinternational.org/journals/ijriss/arti ... site-classification/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bcp:journl:v:9:y:2025:issue-6:p:1574-1586
Access Statistics for this article
International Journal of Research and Innovation in Social Science is currently edited by Dr. Nidhi Malhan
More articles in International Journal of Research and Innovation in Social Science from International Journal of Research and Innovation in Social Science (IJRISS)
Bibliographic data for series maintained by Dr. Pawan Verma ().