EconPapers    
Economics at your fingertips  
 

Topic detection with recursive consensus clustering and semantic enrichment

Vincenzo De Leo (), Michelangelo Puliga, Marco Bardazzi, Filippo Capriotti, Andrea Filetti and Alessandro Chessa
Additional contact information
Vincenzo De Leo: Linkalab, CoMPLeX SySTeMS CoMPuTaTioNaL LaBoRaToRy
Michelangelo Puliga: Linkalab, CoMPLeX SySTeMS CoMPuTaTioNaL LaBoRaToRy
Marco Bardazzi: Eni S.p.A
Filippo Capriotti: Eni S.p.A
Andrea Filetti: Eni S.p.A
Alessandro Chessa: Linkalab, CoMPLeX SySTeMS CoMPuTaTioNaL LaBoRaToRy

Palgrave Communications, 2023, vol. 10, issue 1, 1-10

Abstract: Abstract Extracting meaningful information from short texts like tweets has proved to be a challenging task. Literature on topic detection focuses mostly on methods that try to guess the plausible words that describe topics whose number has been decided in advance. Topics change according to the initial setup of the algorithms and show a consistent instability with words moving from one topic to another one. In this paper we propose an iterative procedure for topic detection that searches for the most stable solutions in terms of words describing a topic. We use an iterative procedure based on clustering on the consensus matrix, and traditional topic detection, to find both a stable set of words and an optimal number of topics. We observe however that in several cases the procedure does not converge to a unique value but oscillates. We further enhance the methodology using semantic enrichment via Word Embedding with the aim of reducing noise and improving topic separation. We foresee the application of this set of techniques in an automatic topic discovery in noisy channels such as Twitter or social media.

Date: 2023
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1057/s41599-023-01711-0 Abstract (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:pal:palcom:v:10:y:2023:i:1:d:10.1057_s41599-023-01711-0

Ordering information: This journal article can be ordered from
https://www.nature.com/palcomms/about

DOI: 10.1057/s41599-023-01711-0

Access Statistics for this article

More articles in Palgrave Communications from Palgrave Macmillan
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-19
Handle: RePEc:pal:palcom:v:10:y:2023:i:1:d:10.1057_s41599-023-01711-0