A Hybrid Model for the Measurement of the Similarity between Twitter Profiles

Shoeibi, Niloufar; Shoeibi, Nastaran; Chamoso, Pablo; Alizadehsani, Zakieh; Corchado, Juan Manuel

A Hybrid Model for the Measurement of the Similarity between Twitter Profiles

Niloufar Shoeibi, Nastaran Shoeibi, Pablo Chamoso, Zakieh Alizadehsani and Juan Manuel Corchado
Additional contact information
Niloufar Shoeibi: BISITE Research Group, University of Salamanca, 37007 Salamanca, Spain
Nastaran Shoeibi: Faculty of Science, University of Salamanca, 37008 Salamanca, Spain
Pablo Chamoso: BISITE Research Group, University of Salamanca, 37007 Salamanca, Spain
Zakieh Alizadehsani: BISITE Research Group, University of Salamanca, 37007 Salamanca, Spain
Juan Manuel Corchado: BISITE Research Group, University of Salamanca, 37007 Salamanca, Spain

Sustainability, 2022, vol. 14, issue 9, 1-19

Abstract: Social media platforms have been an undeniable part of our lifestyle for the past decade. Analyzing the information that is being shared is a crucial step to understanding human behavior. Social media analysis aims to guarantee a better experience for the user and to increase user satisfaction. To draw any further conclusions, first, it is necessary to know how to compare users. In this paper, a hybrid model is proposed to measure the degree of similarity between Twitter profiles by calculating features related to the users’ behavioral habits. For this, first, the timeline of each profile was extracted using the official TwitterAPI. Then, three aspects of a profile were deliberated in parallel. Behavioral ratios are time-series-related information showing the consistency and habits of the user. Dynamic time warping was utilized to compare the behavioral ratios of two profiles. Next, the audience network was extracted for each user, and to estimate the similarity of two sets, the Jaccard similarity was used. Finally, for the content similarity measurement, the tweets were preprocessed using the feature extraction method; TF-IDF and DistilBERT were employed for feature extraction and then compared using the cosine similarity method. The results showed that TF-IDF had slightly better performance; it was therefore selected for use in the model. When measuring the similarity level of different profiles, a Random Forest classification model was used, which was trained on 19,900 users, revealing a 0.97 accuracy in detecting similar profiles from different ones. As a step further, this convoluted similarity measurement can find users with very short distances, which are indicative of duplicate users.

Keywords: Twitter; social media; social networking; social network analytics; DistilBERT; text similarity; natural language processing; character computing (search for similar items in EconPapers)
JEL-codes: O13 Q Q0 Q2 Q3 Q5 Q56 (search for similar items in EconPapers)
Date: 2022
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2071-1050/14/9/4909/pdf (application/pdf)
https://www.mdpi.com/2071-1050/14/9/4909/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jsusta:v:14:y:2022:i:9:p:4909-:d:797323

Access Statistics for this article

Sustainability is currently edited by Ms. Alexandra Wu

More articles in Sustainability from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().