Economics at your fingertips  

TEXTFIND: Stata module to identify, analyze, and convert text entries into categorical data

Andre Assumpcao ()

Statistical Software Components from Boston College Department of Economics

Abstract: textfind is a data-driven program that identifies, analyzes, and converts textual data into categorical variables for further use in quantitative analysis. It uses regular expressions to find one (or more) keyword and exclusion (i.e. n-grams), reporting six statistics summarizing search quality: the number of observations in the dataset that were matched; the number of word occurrences per observation; the textual length in which word is found; the position at which the word was first found; the term frequency-inverse document frequency (tf-idf) of the word used in the search; and the p-value of a means comparison test between samples identified by different search criteria.

Language: Stata
Requires: Stata version 15
Keywords: text; regexp; ngrams; textual analysis (search for similar items in EconPapers)
Date: 2019-04-13, Revised 2019-10-02
Note: This module should be installed from within Stata by typing "ssc install textfind". The module is made available under terms of the GPL v3 ( Windows users should not attempt to download these files with a web browser.
References: Add references at CitEc
Citations: Track citations by RSS feed

Downloads: (external link) program code (text/plain) help file (text/plain)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link:

Ordering information: This software item can be ordered from

Access Statistics for this software item

More software in Statistical Software Components from Boston College Department of Economics Boston College, 140 Commonwealth Avenue, Chestnut Hill MA 02467 USA. Contact information at EDIRC.
Bibliographic data for series maintained by Christopher F Baum ().

Page updated 2023-02-01
Handle: RePEc:boc:bocode:s458633