Estimating text regressions using txtreg_train
Carlo Schwarz
Stata Journal, 2023, vol. 23, issue 3, 799-812
Abstract:
In this article, I introduce new commands to estimate text regressions for continuous, binary, and categorical variables based on text strings. The command txtreg_train automatically handles text cleaning, tokenization, model training, and cross-validation for lasso, ridge, elastic-net, and regularized logis- tic regressions. The txtreg_predict command obtains the predictions from the trained text regression model. Furthermore, the txtreg_analyze command facil- itates the analysis of the coefficients of the text regression model. Together, these commands provide a convenient toolbox for researchers to train text regressions. They also allow sharing of pretrained text regression models with other researchers.
Keywords: txtreg_train; txtreg_predict; txtreg_analyze; text regressions; machine learning; text analysis (search for similar items in EconPapers)
Date: 2023
Note: to access software from within Stata, net describe http://www.stata-journal.com/software/sj23-3/dm0112/
References: Add references at CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
http://www.stata-journal.com/article.html?article=dm0112 link to article purchase
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:tsj:stataj:v:23:y:2023:i:3:p:779-812
Ordering information: This journal article can be ordered from
http://www.stata-journal.com/subscription.html
Access Statistics for this article
Stata Journal is currently edited by Nicholas J. Cox and Stephen P. Jenkins
More articles in Stata Journal from StataCorp LLC
Bibliographic data for series maintained by Christopher F. Baum () and Lisa Gilmore ().