An end-to-end framework for private DGA detection as a service
Ricardo J M Maia,
Dustin Ray,
Sikha Pentyala,
Rafael Dowsley,
Martine De Cock,
Anderson C A Nascimento and
Ricardo Jacobi
PLOS ONE, 2024, vol. 19, issue 8, 1-24
Abstract:
Domain Generation Algorithms (DGAs) are used by malware to generate pseudorandom domain names to establish communication between infected bots and command and control servers. While DGAs can be detected by machine learning (ML) models with great accuracy, offering DGA detection as a service raises privacy concerns when requiring network administrators to disclose their DNS traffic to the service provider. The main scientific contribution of this paper is to propose the first end-to-end framework for privacy-preserving classification as a service of domain names into DGA (malicious) or non-DGA (benign) domains. Our framework achieves these goals by carefully designed protocols that combine two privacy-enhancing technologies (PETs), namely secure multi-party computation (MPC) and differential privacy (DP). Through MPC, our framework enables an enterprise network administrator to outsource the problem of classifying a DNS (Domain Name System) domain as DGA or non-DGA to an external organization without revealing any information about the domain name. Moreover, the service provider’s ML model used for DGA detection is never revealed to the network administrator. Furthermore, by using DP, we also ensure that the classification result cannot be used to learn information about individual entries of the training data. Finally, we leverage post-training float16 quantization of deep learning models in MPC to achieve efficient, secure DGA detection. We demonstrate that by using quantization achieves a significant speed-up, resulting in a 23% to 42% reduction in inference runtime without reducing accuracy using a three party secure computation protocol tolerating one corruption. Previous solutions are not end-to-end private, do not provide differential privacy guarantees for the model’s outputs, and assume that model embeddings are publicly known. Our best protocol in terms of accuracy runs in about 0.22s.
Date: 2024
References: Add references at CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0304476 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 04476&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0304476
DOI: 10.1371/journal.pone.0304476
Access Statistics for this article
More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().