TEXTFIND: Stata module to identify, analyze, and convert text entries into categorical data
Andre Assumpcao ()
Statistical Software Components from Boston College Department of Economics
textfind is a data-driven program that identifies, analyzes, and converts textual data into categorical variables for further use in quantitative analysis. It uses regular expressions to find one (or more) keyword and exclusion (i.e. n-grams), reporting six statistics summarizing search quality: the number of observations in the dataset that were matched; the number of word occurrences per observation; the textual length in which word is found; the position at which the word was first found; the term frequency-inverse document frequency (tf-idf) of the word used in the search; and the p-value of a means comparison test between samples identified by different search criteria.
Requires: Stata version 15
Keywords: text; regexp; ngrams; textual analysis (search for similar items in EconPapers)
Date: 2019-04-13, Revised 2019-10-02
Note: This module should be installed from within Stata by typing "ssc install textfind". The module is made available under terms of the GPL v3 (https://www.gnu.org/licenses/gpl-3.0.txt). Windows users should not attempt to download these files with a web browser.
References: Add references at CitEc
Citations: Track citations by RSS feed
Downloads: (external link)
http://fmwww.bc.edu/repec/bocode/t/textfind.ado program code (text/plain)
http://fmwww.bc.edu/repec/bocode/t/textfind.sthlp help file (text/plain)
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
Persistent link: https://EconPapers.repec.org/RePEc:boc:bocode:s458633
Ordering information: This software item can be ordered from
Access Statistics for this software item
More software in Statistical Software Components from Boston College Department of Economics Boston College, 140 Commonwealth Avenue, Chestnut Hill MA 02467 USA. Contact information at EDIRC.
Bibliographic data for series maintained by Christopher F Baum ().