A Sampling-Based Method for Detecting Data Poisoning Attacks in Recommendation Systems

Li, Mohan; Lian, Yuxin; Zhu, Jinpeng; Lin, Jingyi; Wan, Jiawen; Sun, Yanbin

A Sampling-Based Method for Detecting Data Poisoning Attacks in Recommendation Systems

Mohan Li, Yuxin Lian, Jinpeng Zhu, Jingyi Lin, Jiawen Wan and Yanbin Sun ()
Additional contact information
Mohan Li: Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou 510006, China
Yuxin Lian: Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou 510006, China
Jinpeng Zhu: Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou 510006, China
Jingyi Lin: Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou 510006, China
Jiawen Wan: Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou 510006, China
Yanbin Sun: Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou 510006, China

Mathematics, 2024, vol. 12, issue 2, 1-13

Abstract: The recommendation algorithm based on collaborative filtering is vulnerable to data poisoning attacks, wherein attackers can manipulate system output by injecting a large volume of fake rating data. To address this issue, it is essential to investigate methods for detecting systematically injected poisoning data within the rating matrix. Since attackers often inject a significant quantity of poisoning data in a short period to achieve their desired impact, these data may exhibit spatial proximity. In other words, poisoning data may be concentrated in adjacent rows of the rating matrix. This paper capitalizes on the proximity characteristics of poisoning data in the rating matrix and introduces a sampling-based method for detecting data poisoning attacks. First, we designed a rating matrix sampling method specifically for detecting poisoning data. By sampling differences obtained from the original rating matrix, it is possible to infer the presence of poisoning attacks and effectively discard poisoning data. Second, we developed a method for pinpointing malicious data based on the distance of rating vectors. Through distance calculations, we can accurately identify the positions of malicious data. After that, we validated the method on three real-world datasets. The results demonstrate the effectiveness of our method in identifying malicious data within the rating matrix.

Keywords: data poisoning; recommendation systems; ensemble learning; data poisoning detection (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2024
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/12/2/247/pdf (application/pdf)
https://www.mdpi.com/2227-7390/12/2/247/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:12:y:2024:i:2:p:247-:d:1317707

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().