EconPapers    
Economics at your fingertips  
 

Text mining and hierarchical clustering in Stata: An applied approach for real-time policy monitoring, forecasting, and literature mapping

Carlo Drago
Additional contact information
Carlo Drago: Università degli Studi Niccolò Cusano

Italian Stata Users' Group Meetings 2025 from Stata Users Group

Abstract: This presentation shows an applied framework for text mining and clustering in the Stata environment and provides practical tools for policy-relevant research in economics and health economics. With the growing amount of unstructured textual data—from financial news and analyst reports to scientific publications— there is an increasing demand for scalable methods to classify and interpret such information for evidence-based policy and forecasting. A first relevant concept is the Stata capacity to be integrated with Python with aim to implement hierarchical clustering from scratch using TF-IDF vectorization and cosine distance. This technique is specifically applied to economic text sources—such as headlines or institutional communications—with the aim to segment documents into a fixed or silhouette- optimized number of clusters. This approach allows researchers to identify patterns on data, uncover latent themes, and organize information for macroeconomic forecasting, sentiment analysis, or real-time policy monitoring. In the second part, I focus on literature mapping in health economics. Using a curated corpus of article titles related to telemedicine and diabetes, I apply a native Stata pipeline based on text normalization and clustering to identify thematic areas within the literature. The approach promotes organized reviews in health technology assessment and policy evaluation and makes evidence synthesis more accessible. By combining native Stata capabilities with Python-enhanced workflows, I provide applied researchers with an accessible and policy-relevant toolkit for unsupervised text classification in multiple domains.

Date: 2025-10-01
References: Add references at CitEc
Citations:

Downloads: (external link)
http://repec.org/isug2025/ presentation materials (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:boc:isug25:14

Access Statistics for this paper

More papers in Italian Stata Users' Group Meetings 2025 from Stata Users Group Contact information at EDIRC.
Bibliographic data for series maintained by Christopher F Baum ().

 
Page updated 2025-09-26
Handle: RePEc:boc:isug25:14