Finding Patient Zero and Tracking Narrative Changes in the Context of Online Disinformation Using Semantic Similarity Analysis
Codruț-Georgian Artene (),
Ciprian Oprișa (),
Cristian Nicolae Buțincu () and
Florin Leon ()
Additional contact information
Codruț-Georgian Artene: Department of Computer Science and Engineering, “Gheorghe Asachi” Technical University of Iasi, 700050 Iasi, Romania
Ciprian Oprișa: Computer Science Department, Technical University of Cluj-Napoca, 400114 Cluj-Napoca, Romania
Cristian Nicolae Buțincu: Department of Computer Science and Engineering, “Gheorghe Asachi” Technical University of Iasi, 700050 Iasi, Romania
Florin Leon: Department of Computer Science and Engineering, “Gheorghe Asachi” Technical University of Iasi, 700050 Iasi, Romania
Mathematics, 2023, vol. 11, issue 9, 1-26
Abstract:
Disinformation in the form of news articles, also called fake news, is used by multiple actors for nefarious purposes, such as gaining political advantages. A key component for fake news detection is the ability to find similar articles in a large documents corpus, for tracking narrative changes and identifying the root source (patient zero) of a particular piece of information. This paper presents new techniques based on textual and semantic similarity that were adapted for achieving this goal on large datasets of news articles. The aim is to determine which of the implemented text similarity techniques is more suitable for this task. For text similarity, a Locality-Sensitive Hashing is applied on n -grams extracted from text to produce representations that are further indexed to facilitate the quick discovery of similar articles. The semantic textual similarity technique is based on sentence embeddings from pre-trained language models, such as BERT, and Named Entity Recognition. The proposed techniques are evaluated on a collection of Romanian articles to determine their performance in terms of quality of results and scalability. The presented techniques produce competitive results. The experimental results show that the proposed semantic textual similarity technique is better at identifying similar text documents, while the Locality-Sensitive Hashing text similarity technique outperforms it in terms of execution time and scalability. Even if they were evaluated only on Romanian texts and some of them are based on pre-trained models for the Romanian language, the methods that are the basis of these techniques allow their extension to other languages, with few to no changes, provided that there are pre-trained models for other languages as well. As for a cross-lingual setup, more changes are needed along with tests to demonstrate this capability. Based on the obtained results, one may conclude that the presented techniques are suitable to be integrated into a decentralized anti-disinformation platform for fact-checking and trust assessment.
Keywords: semantic similarity; patient zero; narrative changes; fighting disinformation (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2227-7390/11/9/2053/pdf (application/pdf)
https://www.mdpi.com/2227-7390/11/9/2053/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:11:y:2023:i:9:p:2053-:d:1133795
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().