Improving Machine Learning Algorithms with CoClust-Based Feature Selection on Big Data: A Comparative Analysis
Zeynep Ilhan Taskin () and
Kasirga Yildirak ()
Additional contact information
Zeynep Ilhan Taskin: Eskisehir Osmangazi University
Kasirga Yildirak: Hacettepe University
A chapter in Directional and Multivariate Statistics, 2025, pp 411-439 from Springer
Abstract:
Abstract The feature selection stage can be used to create machine learning algorithms, which can lead to better outcomes. The dependency structure between the variables is regarded as the most crucial factor in the feature selection stage. Copula-Based Clustering technique (CoClust), which relies on non-linear dependency and groups only related variables, makes a difference in identifying the dependency structure. In this study, we demonstrate that by combining the Random Forest, AdaBoost, and XGBoost approaches with the CoClust-based feature selection step, it is possible to achieve a notable improvement in CPU times and accuracy. On two different big data sets, we compare CoClust with K-means and hierarchical clustering techniques in order to assess its contribution to algorithms. CPU time, accuracy, and ROC (receiver operating characteristic) curve are used to compare the results.
Keywords: Random forest; AdaBoost; XGBoost; CoClust; Feature selection (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:
There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:sprchp:978-981-96-2004-3_21
Ordering information: This item can be ordered from
http://www.springer.com/9789819620043
DOI: 10.1007/978-981-96-2004-3_21
Access Statistics for this chapter
More chapters in Springer Books from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().