Adaptive TreeHive: Ensemble of trees for enhancing imbalanced intrusion classification
Mahbub E Sobhani,
Anika Tasnim Rodela and
Dewan Md Farid
PLOS ONE, 2025, vol. 20, issue 9, 1-38
Abstract:
Imbalanced intrusion classification is a complex and challenging task as there are few number of instances/intrusions generally considered as minority instances/intrusions in the imbalanced intrusion datasets. Data sampling methods such as over-sampling and under-sampling methods are commonly applied for dealing with imbalanced intrusion data. In over-sampling, synthetic minority instances are generated e.g. SMOTE (Synthetic Minority Over-sampling Technique) and on the contrary, under-sampling methods remove the majority-class instances to create balanced data e.g. random under-sampling. Both over-sampling and under-sampling methods have the disadvantages as over-sampling technique creates overfitting and under-sampling technique ignores a large portion of the data. Ensemble learning in supervised machine learning is also a common technique for handling imbalanced data. Random Forest and Bagging techniques address the overfitting problem, and Boosting (AdaBoost) gives more attention to the minority-class instances in its iterations. In this paper, we have proposed a method for selecting the most informative instances that represent the overall dataset. We have applied both over-sampling and under-sampling techniques to balance the data by employing the majority and minority informative instances. We have used Random Forest, Bagging, and Boosting (AdaBoost) algorithms and have compared their performances. We have used decision tree (C4.5) as the base classifier of Random Forest and AdaBoost classifiers and naïve Bayes classifier as the base classifier of the Bagging model. The proposed method Adaptive TreeHive addresses both the issues of imbalanced ratio and high dimensionality, resulting in reduced computational power and execution time requirements. We have evaluated the proposed Adaptive TreeHive method using five large-scale public benchmark datasets. The experimental results, compared to data balancing methods such as under-sampling and over-sampling, exhibit superior performance of the Adaptive TreeHive with accuracy rates of 99.96%, 85.65%, 99.83%, 99.77%, and 95.54% on the NSL-KDD, UNSW-NB15, CIC-IDS2017, CSE-CIC-IDS2018, and CICDDoS2019 datasets, respectively, establishing the Adaptive TreeHive as a superior performer compared to the traditional ensemble classifiers.
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0331307 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 31307&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0331307
DOI: 10.1371/journal.pone.0331307
Access Statistics for this article
More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().