Text mining with n-gram variables
Matthias Schonlau
Additional contact information
Matthias Schonlau: University of Waterloo
2020 Stata Conference from Stata Users Group
Abstract:
Text data, such as answers to open-ended questions, are sometimes ignored because they are hard to analyze. Our Stata command ngram turns text into hundreds of variables using the "bag of words" approach. Broadly speaking, each variable records how often the corresponding word or word sequence occurs in a given text. This is more useful than it sounds. The program supports text in 12 European languages. (Schonlau, M, Guenther, and N Sucholutsky 2017)
Date: 2020-08-20
New Economics Papers: this item is included in nep-cmp
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
http://fmwww.bc.edu/repec/scon2020/us20_Schonlau.pdf
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:boc:scon20:10
Access Statistics for this paper
More papers in 2020 Stata Conference from Stata Users Group Contact information at EDIRC.
Bibliographic data for series maintained by Christopher F Baum ().