Applying Word Embeddings and Graph Neural Networks for Effective Malware Classification
Manasa Mananjaya and
Fabio Di Troia ()
Additional contact information
Manasa Mananjaya: San Jose State University
Fabio Di Troia: San Jose State University
A chapter in Machine Learning, Deep Learning and AI for Cybersecurity, 2025, pp 143-167 from Springer
Abstract:
Abstract The significance of word embeddings in natural language processing for capturing semantic relationships between words is widely acknowledged. This study aims to explore the efficacy of word embedding techniques in classifying malware. Specifically, we evaluate the effectiveness of applying Graph Neural Networks (GNNs) to weighted graphs formed from word embeddings generated by analyzing opcode sequences in malware files. In the initial experiments, we employ the Graph Convolution Network (GCN) on weighted graphs generated using different word embedding techniques, including Bag-of-words, TF-IDF, and Word2Vec. The results indicate that Word2Vec provides the most effective word embeddings, serving as the baseline for comparison with three GNN models, namely Graph Convolution Network, Graph Attention Network (GAT), and GraphSAGE Network. Subsequently, we conduct further experiments, generating vector embeddings of varying lengths using Word2Vec, and utilizing these embeddings as node features for constructing weighted graphs. Through performance comparison of the GNN models, we demonstrate that larger vector embeddings significantly enhance the models’ ability to classify malware files into their respective families. Furthermore, we compare the result achieved using Word2Vec embeddings against those obtained through contextualized embeddings from BERT. Overall, our experiments show the potential of word embeddings as node features for GNN classification, with an increase in accuracy from 71.6 to 91.91% when Word2Vec embeddings were used in combination with GCN.
Date: 2025
References: Add references at CitEc
Citations:
There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:sprchp:978-3-031-83157-7_6
Ordering information: This item can be ordered from
http://www.springer.com/9783031831577
DOI: 10.1007/978-3-031-83157-7_6
Access Statistics for this chapter
More chapters in Springer Books from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().