EconPapers    
Economics at your fingertips  
 

Parsimonious Wasserstein Text-mining

Sébastien Gadat and Stéphane Villeneuve

No 23-1471, TSE Working Papers from Toulouse School of Economics (TSE)

Abstract: This document introduces a parsimonious novel method of processing textual data based on the NMF factorization and on supervised clustering withWasserstein barycenter’s to reduce the dimension of the model. This dual treatment of textual data allows for a representation of a text as a probability distribution on the space of profiles which accounts for both uncertainty and semantic interpretability with the Wasserstein distance. The full textual information of a given period is represented as a random probability measure. This opens the door to a statistical inference method that seeks to predict a financial data using the information generated by the texts of a given period.

Keywords: Natural Language Processing; Textual Analysis; Wasserstein distance; clustering (search for similar items in EconPapers)
Date: 2023-09-20
New Economics Papers: this item is included in nep-big, nep-cmp and nep-ger
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.tse-fr.eu/sites/default/files/TSE/docu ... 2023/wp_tse_1471.pdf Full Text (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:tse:wpaper:128497

Access Statistics for this paper

More papers in TSE Working Papers from Toulouse School of Economics (TSE) Contact information at EDIRC.
Bibliographic data for series maintained by ().

 
Page updated 2025-04-01
Handle: RePEc:tse:wpaper:128497