EconPapers    
Economics at your fingertips  
 

Comparing Balancing Techniques for Malware Classification

Ranjit John and Fabio Di Troia ()
Additional contact information
Ranjit John: San Jose State University
Fabio Di Troia: San Jose State University

A chapter in Machine Learning, Deep Learning and AI for Cybersecurity, 2025, pp 61-92 from Springer

Abstract: Abstract Imbalanced datasets often disproportionately represent certain types of malware, which can negatively impact the performance of machine learning classifiers. This imbalance can result in insufficient data for rarer but highly dangerous malware, leading to potential detection failures with serious consequences. To address this, data balancing techniques have proven effective in improving the representation of minority classes and mitigating bias toward the majority class. Recent studies have also shown that generative models can successfully create synthetic data that closely mirrors real datasets. In this paper, we explore various balancing techniques and generate synthetic opcode sequence data to enhance the training of machine learning models for improved malware classification. Our approach includes oversampling, undersampling, hybrid sampling, and the use of Wasserstein Generative Adversarial Networks with Gradient Penalty (WGAN-GP) to generate synthetic samples. We assess the effectiveness of these methods in tackling the class imbalance problem in multi-class malware classification.

Date: 2025
References: Add references at CitEc
Citations:

There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:sprchp:978-3-031-83157-7_3

Ordering information: This item can be ordered from
http://www.springer.com/9783031831577

DOI: 10.1007/978-3-031-83157-7_3

Access Statistics for this chapter

More chapters in Springer Books from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2026-05-12
Handle: RePEc:spr:sprchp:978-3-031-83157-7_3