EconPapers    
Economics at your fingertips  
 

Online Clustering of Known and Emerging Malware Families

Olha Jurečková (), Martin Jureček () and Mark Stamp ()
Additional contact information
Olha Jurečková: Czech Technical University in Prague, Faculty of Information Technology
Martin Jureček: Czech Technical University in Prague, Faculty of Information Technology
Mark Stamp: San Jose State University

A chapter in Machine Learning, Deep Learning and AI for Cybersecurity, 2025, pp 37-59 from Springer

Abstract: Abstract Malware attacks have become significantly more frequent and sophisticated in recent years. Therefore, malware detection and classification are critical components of information security. Due to the large amount of malware samples available, it is essential to categorize malware samples according to their malicious characteristics. Clustering algorithms are thus becoming more widely used in computer security to analyze the behavior of malware variants and discover new malware families. Online clustering algorithms help us to understand malware behavior and produce a quicker response to new threats. This paper introduces a novel machine learning-based model for the online clustering of malicious samples into malware families. Streaming data is divided according to the clustering decision rule into samples from known and new emerging malware families. The streaming data is classified using the weighted k-nearest neighbor classifier into known families, and the online k-means algorithm clusters the remaining streaming data and achieves a purity of clusters from 90.20% for four clusters to 93.34% for ten clusters. This work is based on static analysis of portable executable files for the Windows operating system. Experimental results indicate that the proposed online clustering model can create high-purity clusters corresponding to malware families. This allows malware analysts to receive similar malware samples, speeding up their analysis.

Date: 2025
References: Add references at CitEc
Citations:

There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:sprchp:978-3-031-83157-7_2

Ordering information: This item can be ordered from
http://www.springer.com/9783031831577

DOI: 10.1007/978-3-031-83157-7_2

Access Statistics for this chapter

More chapters in Springer Books from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2026-02-19
Handle: RePEc:spr:sprchp:978-3-031-83157-7_2