Improving the Performance of Static Malware Classification Using Deep Learning Models and Feature Reduction Strategies

Lai, Tai-Hung; Tsai, Yun-Jyun; Liu, Chiang-Lung

Improving the Performance of Static Malware Classification Using Deep Learning Models and Feature Reduction Strategies

Tai-Hung Lai, Yun-Jyun Tsai and Chiang-Lung Liu ()
Additional contact information
Tai-Hung Lai: Department of Computer Science and Information Engineering, Chung Cheng Institute of Technology, National Defense University, Taoyuan 335009, Taiwan
Yun-Jyun Tsai: National Chung-Shan Institute of Science and Technology, Taoyuan 325204, Taiwan
Chiang-Lung Liu: Department of Electrical and Electronic Engineering, Chung Cheng Institute of Technology, National Defense University, Taoyuan 335009, Taiwan

Mathematics, 2025, vol. 13, issue 23, 1-19

Abstract: The rapid evolution of malware continues to pose severe challenges to cybersecurity, highlighting the need for accurate and efficient detection systems. Traditional signature- and heuristic-based methods are increasingly inadequate against sophisticated threats, which has motivated the use of machine learning and deep learning for static malware classification. In this study, we propose three deep neural network (DNN) architectures tailored for the binary classification of Portable Executable (PE) files. The models were trained and validated on the EMBER 2017 dataset and further tested on the independent REWEMA dataset to evaluate their cross-dataset generalization capabilities. To address the computational burden of high-dimensional feature vectors, two feature reduction strategies were examined: the Kumar method, which selected 276 features, and the LightGBM-based intersection method, which identified 206 shared features. Experimental results showed that the proposed Model III consistently achieved the best overall performance, outperforming LightGBM (v3.3.5) and the other DNN models in terms of accuracy, recall, and F1-score. Notably, its recall exceeded that of LightGBM by 0.73%, highlighting its superiority in reducing false negative rates. Feature reduction further demonstrated that significant dimensionality reduction could be achieved without compromising classification quality, with the Kumar method achieving the best balance between accuracy and efficiency. Cross-dataset validation revealed performance degradation across all models due to distributional shifts, but the decline was less significant for the DNNs, confirming its greater adaptability compared with LightGBM. These findings demonstrate that architectural optimization and appropriate feature selection can significantly improve the performance of static malware classification. This study also provides empirical benchmarks and methodological guidance for developing accurate, efficient, and resilient malware detection systems that are resilient to evolving threats.

Keywords: static malware classification; deep learning; EMBER; REWEMA (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/13/23/3753/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/23/3753/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:23:p:3753-:d:1800987

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().