EconPapers    
Economics at your fingertips  
 

Dynamic weighted cluster-sampling: An optimized cohesive method for improving data quality in the context of big data

Benabderrahmane Moutassem (), Laouni Djafri () and Abdelkader Gafour ()

International Journal of Innovative Research and Scientific Studies, 2025, vol. 8, issue 3, 1703-1720

Abstract: In the field of data mining, imbalanced big data has emerged as a critical challenge, characterized by a disproportionate distribution of classes within large datasets. This phenomenon often results in biased models that underperform on minority classes, compromising the overall effectiveness of predictive analytics. Standard machine learning algorithms may struggle to accurately classify underrepresented instances, leading to predictions that reflect majority class tendencies rather than the true underlying patterns. To effectively address these challenges, it is imperative to employ advanced methods. This work presents a novel hybrid approach designed to mitigate the challenges of imbalanced big data classification effectively by employing clustering and sampling methods. Our proposed approach aims to reduce data volume, enhance veracity (improving performance metrics), and accelerate execution time, all while preserving essential attributes and ensuring data reliability. The results demonstrate that our approach achieves superior accuracy, AUC, F1-score, and G-means metrics compared to scenarios lacking data balancing strategies. Furthermore, we evaluate our proposed method against current methods in the field using large imbalanced datasets. Notably, our method exhibits an impressive accuracy rate approaching 100%, with improvements ranging from 17% to 22% across all performance metrics assessed, thus underscoring its effectiveness in addressing the challenges associated with imbalanced big data classification.

Keywords: Big data mining; clustering; cross-validation; imbalanced data; machine learning; sampling. (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://ijirss.com/index.php/ijirss/article/view/6878/1376 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:aac:ijirss:v:8:y:2025:i:3:p:1703-1720:id:6878

Access Statistics for this article

International Journal of Innovative Research and Scientific Studies is currently edited by Natalie Jean

More articles in International Journal of Innovative Research and Scientific Studies from Innovative Research Publishing
Bibliographic data for series maintained by Natalie Jean ().

 
Page updated 2025-05-11
Handle: RePEc:aac:ijirss:v:8:y:2025:i:3:p:1703-1720:id:6878