Benchmarking lexicon-based ensemble web data classification against traditional classification methods

T, Yogesha; N, Thimmaraju S

Benchmarking lexicon-based ensemble web data classification against traditional classification methods

Yogesha T () and Thimmaraju S N ()

International Journal of Innovative Research and Scientific Studies, 2025, vol. 8, issue 2, 1736-1745

Abstract: A lexicon-based ensemble web data classification approach is designed for classic machine learning techniques to emphasize the accuracy and efficiency of textual data from the web. As the volume of internet material expands dramatically, effective and scalable techniques for classifying it are crucial. Traditional classifiers such as Support Vector Machines (SVM), Naive Bayes (NB), and Decision Trees (DT) rely on statistical learning from labeled datasets, which necessitates a huge quantity of training data and processing resources. Lexicon-based techniques, on the other hand, employ prepared collections of words (lexicons) linked with certain classes or sentiments, eliminating the need for extensive training but frequently lacking generalizability. This comprehensive paper suggests a lexicon-based ensemble classification system that incorporates several lexicons, each optimized for particular features of web data, and compares it to conventional approaches in terms of accuracy, scalability, and performance in order to overcome the drawbacks of both lexicon-based and traditional classifiers. By using the benefits of many lexicons, the ensemble technique reduces individual biases and boosts robustness. Additionally, the use of ensemble approaches enhances classification accuracy by adding a layer of decision-making, especially when dealing with noisy and unstructured online data like news articles, blogs, and social media postings. Through a series of tests, the paper compares the ensemble lexicon-based approach to SVM, NB, DT, and Random Forests (RF) using a number of benchmark datasets. Performance is evaluated using metrics including accuracy, recall, F1 score, and computational efficiency. The findings demonstrate that the lexicon-based ensemble approach provides more precision in sentiment and topic classification tasks and performs better than conventional classifiers in situations with sparse or noisy labeled data. However, when large, high-quality labeled datasets are available, classical classifiers perform better, showing stronger recall and generalization ability. By showing that lexicon-based models, when appropriately adjusted and combined, can compete with or even surpass traditional classifiers in particular situations, this study adds to the expanding corpus of research on hybrid and ensemble learning approaches and makes them a useful tool in the larger field of web data analysis.

Keywords: Benchmarking; Decision trees; Ensemble; Lexicon; Naive bayes; Sentiment analysis; Support vector machines. (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://ijirss.com/index.php/ijirss/article/view/5537/973 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:aac:ijirss:v:8:y:2025:i:2:p:1736-1745:id:5537

Access Statistics for this article

International Journal of Innovative Research and Scientific Studies is currently edited by Natalie Jean

More articles in International Journal of Innovative Research and Scientific Studies from Innovative Research Publishing
Bibliographic data for series maintained by Natalie Jean ().