Incorporating known malware signatures to classify new malware variants in network traffic
Ismahani Ismail,
Muhammad Nadzir Marsono,
Ban Mohammed Khammas and
Sulaiman Mohd Nor
International Journal of Network Management, 2015, vol. 25, issue 6, 471-489
Abstract:
Content‐based malware classification technique using n‐gram features required high computational overhead because of the size of feature space. This paper proposes the augmentation of domain knowledge in the form of known Snort malware signatures to machine learning techniques to reduce resources (in terms of the time to generate machine learning model and the memory usage to store generative model). Although current malware can be encrypted or mutated, these malware still exhibit prevalent contents or payloads as their predecessors. Using a dataset of traffic captured from a campus network, our approach is able to reduce initial generated million n‐gram features to only around 90000 features, which significantly reduces processing time to generate naive Bayes model by 95%. The generated model that has been trained by the most descriptive features (4‐gram Snort signatures with high information gain) produces lower false negative, about 2% compared with other models. Moreover, the proposed method is capable of detecting 10 new malware variants with 0% false negative. The findings from this paper can be the basis for improving malware classification based on content classification to detect known and new malware. Copyright © 2015 John Wiley & Sons, Ltd.
Date: 2015
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://doi.org/10.1002/nem.1913
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:wly:intnem:v:25:y:2015:i:6:p:471-489
Access Statistics for this article
More articles in International Journal of Network Management from John Wiley & Sons
Bibliographic data for series maintained by Wiley Content Delivery ().