EconPapers    
Economics at your fingertips  
 

Semi-supervised sentiment clustering on natural language texts

Luca Frigau (), Maurizio Romano (), Marco Ortu () and Giulia Contu ()
Additional contact information
Luca Frigau: University of Cagliari
Maurizio Romano: University of Cagliari
Marco Ortu: University of Cagliari
Giulia Contu: University of Cagliari

Statistical Methods & Applications, 2023, vol. 32, issue 4, No 9, 1239-1257

Abstract: Abstract In this paper, we propose a semi-supervised method to cluster unstructured textual data called semi-supervised sentiment clustering on natural language texts. The aim is to identify clusters homogeneous with respect to the overall sentiment of the texts analyzed. The method combines different techniques and methodologies: Sentiment Analysis, Threshold-based Naïve Bayes classifier, and Network-based Semi-supervised Clustering. It involves different steps. In the first step, the unstructured text is transformed into structured text, and it is categorized into positive or negative classes using a sentiment analysis algorithm. In the second step, the Threshold-based Naïve Bayes classifier is applied to identify the overall sentiment of the texts and to define a specific sentiment value for the topics. In the last step, Network-based Semi-supervised Clustering is applied to partition the instances into disjoint groups. The proposed algorithm is tested on a collection of reviews written by customers on Booking.com. The results have highlighted the capacity of the proposed algorithm to identify clusters that are distinct, non-overlapped, and homogeneous with respect to the overall sentiment. Results are also easily interpretable thanks to the network representation of the instances that helps to understand the relationship between them.

Keywords: Tb-NB; NeSSC; Reviews; Tourism data; Booking.com (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s10260-023-00691-4 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:stmapp:v:32:y:2023:i:4:d:10.1007_s10260-023-00691-4

Ordering information: This journal article can be ordered from
http://www.springer. ... cs/journal/10260/PS2

DOI: 10.1007/s10260-023-00691-4

Access Statistics for this article

Statistical Methods & Applications is currently edited by Tommaso Proietti

More articles in Statistical Methods & Applications from Springer, Società Italiana di Statistica
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-20
Handle: RePEc:spr:stmapp:v:32:y:2023:i:4:d:10.1007_s10260-023-00691-4