FastText-Based Local Feature Visualization Algorithm for Merged Image-Based Malware Classification Framework for Cyber Security and Cyber Defense

Jang, Sejun; Li, Shuyu; Sung, Yunsick

FastText-Based Local Feature Visualization Algorithm for Merged Image-Based Malware Classification Framework for Cyber Security and Cyber Defense

Sejun Jang, Shuyu Li and Yunsick Sung
Additional contact information
Sejun Jang: Department of Multimedia Engineering, Dongguk University-Seoul, Seoul 04620, Korea
Shuyu Li: Department of Multimedia Engineering, Dongguk University-Seoul, Seoul 04620, Korea
Yunsick Sung: Department of Multimedia Engineering, Dongguk University-Seoul, Seoul 04620, Korea

Mathematics, 2020, vol. 8, issue 3, 1-13

Abstract: The importance of cybersecurity has recently been increasing. A malware coder writes malware into normal executable files. A computer is more likely to be infected by malware when users have easy access to various executables. Malware is considered as the starting point for cyber-attacks; thus, the timely detection, classification and blocking of malware are important. Malware visualization is a method for detecting or classifying malware. A global image is visualized through binaries extracted from malware. The overall structure and behavior of malware are considered when global images are utilized. However, the visualization of obfuscated malware is tough, owing to the difficulties encountered when extracting local features. This paper proposes a merged image-based malware classification framework that includes local feature visualization, global image-based local feature visualization, and global and local image merging methods. This study introduces a fastText-based local feature visualization method: First, local features such as opcodes and API function names are extracted from the malware; second, important local features in each malware family are selected via the term frequency inverse document frequency algorithm; third, the fastText model embeds the selected local features; finally, the embedded local features are visualized through a normalization process. Malware classification based on the proposed method using the Microsoft Malware Classification Challenge dataset was experimentally verified. The accuracy of the proposed method was approximately 99.65%, which is 2.18% higher than that of another contemporary global image-based approach.

Keywords: cyber security; deep learning; malware classification; malware visualization (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2020
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
https://www.mdpi.com/2227-7390/8/3/460/pdf (application/pdf)
https://www.mdpi.com/2227-7390/8/3/460/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:8:y:2020:i:3:p:460-:d:336585

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().