Text mining with n-gram variables

Schonlau, Matthias

Text mining with n-gram variables

Matthias Schonlau
Additional contact information
Matthias Schonlau: University of Waterloo

2020 Stata Conference from Stata Users Group

Abstract: Text data, such as answers to open-ended questions, are sometimes ignored because they are hard to analyze. Our Stata command ngram turns text into hundreds of variables using the "bag of words" approach. Broadly speaking, each variable records how often the corresponding word or word sequence occurs in a given text. This is more useful than it sounds. The program supports text in 12 European languages. (Schonlau, M, Guenther, and N Sucholutsky 2017)

Date: 2020-08-20
New Economics Papers: this item is included in nep-cmp
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
http://fmwww.bc.edu/repec/scon2020/us20_Schonlau.pdf

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:boc:scon20:10

Access Statistics for this paper

More papers in 2020 Stata Conference from Stata Users Group Contact information at EDIRC.
Bibliographic data for series maintained by Christopher F Baum ().