Parsimonious Wasserstein Text-mining
Sébastien Gadat and
Stéphane Villeneuve
No 23-1471, TSE Working Papers from Toulouse School of Economics (TSE)
Abstract:
This document introduces a parsimonious novel method of processing textual data based on the NMF factorization and on supervised clustering withWasserstein barycenter’s to reduce the dimension of the model. This dual treatment of textual data allows for a representation of a text as a probability distribution on the space of profiles which accounts for both uncertainty and semantic interpretability with the Wasserstein distance. The full textual information of a given period is represented as a random probability measure. This opens the door to a statistical inference method that seeks to predict a financial data using the information generated by the texts of a given period.
Keywords: Natural Language Processing; Textual Analysis; Wasserstein distance; clustering (search for similar items in EconPapers)
Date: 2023-09-20
New Economics Papers: this item is included in nep-big, nep-cmp and nep-ger
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.tse-fr.eu/sites/default/files/TSE/docu ... 2023/wp_tse_1471.pdf Full Text (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:tse:wpaper:128497
Access Statistics for this paper
More papers in TSE Working Papers from Toulouse School of Economics (TSE) Contact information at EDIRC.
Bibliographic data for series maintained by ().