KDTM: Multi-Stage Knowledge Distillation Transfer Model for Long-Tailed DGA Detection

Fan, Baoyu; Ma, Han; Liu, Yue; Yuan, Xiaochen; Ke, Wei

KDTM: Multi-Stage Knowledge Distillation Transfer Model for Long-Tailed DGA Detection

Baoyu Fan, Han Ma, Yue Liu (), Xiaochen Yuan and Wei Ke
Additional contact information
Baoyu Fan: Faculty of Applied Sciences, Macao Polytechnic University, Macao 999078, China
Han Ma: Faculty of Applied Sciences, Macao Polytechnic University, Macao 999078, China
Yue Liu: Faculty of Applied Sciences, Macao Polytechnic University, Macao 999078, China
Xiaochen Yuan: Faculty of Applied Sciences, Macao Polytechnic University, Macao 999078, China
Wei Ke: Faculty of Applied Sciences, Macao Polytechnic University, Macao 999078, China

Mathematics, 2024, vol. 12, issue 5, 1-19

Abstract: As the most commonly used attack strategy by Botnets, the Domain Generation Algorithm (DGA) has strong invisibility and variability. Using deep learning models to detect different families of DGA domain names can improve the network defense ability against hackers. However, this task faces an extremely imbalanced sample size among different DGA categories, which leads to low classification accuracy for small sample categories and even classification failure for some categories. To address this issue, we introduce the long-tailed concept and augment the data of small sample categories by transferring pre-trained knowledge. Firstly, we propose the Data Balanced Review Method (DBRM) to reduce the sample size difference between the categories, thus a relatively balanced dataset for transfer learning is generated. Secondly, we propose the Knowledge Transfer Model (KTM) to enhance the knowledge of the small sample categories. KTM uses a multi-stage transfer to transfer weights from the big sample categories to the small sample categories. Furthermore, we propose the Knowledge Distillation Transfer Model (KDTM) to relieve the catastrophic forgetting problem caused by transfer learning, which adds knowledge distillation loss based on the KTM. The experimental results show that KDTM can significantly improve the classification performance of all categories, especially the small sample categories. It can achieve a state-of-the-art macro average F1 score of 84.5%. The robustness of the KDTM model is verified using three DGA datasets that follow the Pareto distributions.

Keywords: domain generation algorithm; long-tailed problem; transfer learning; knowledge distillation; data balanced review method (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/12/5/626/pdf (application/pdf)
https://www.mdpi.com/2227-7390/12/5/626/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:12:y:2024:i:5:p:626-:d:1342396

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().