EconPapers    
Economics at your fingertips  
 

Textual Factors: A Scalable, Interpretable, and Data-driven Approach to Analyzing Unstructured Information

Lin Cong, Tengyuan Liang, Xiao Zhang and Wu Zhu

No 33168, NBER Working Papers from National Bureau of Economic Research, Inc

Abstract: We introduce a general approach for analyzing large-scale text-based data, combining the strengths of neural network language processing and generative statistical modeling to create a factor structure of unstructured data for downstream regressions typically used in social sciences. We generate textual factors by (i) representing texts using vector word embedding, (ii) clustering the vectors using Locality-Sensitive Hashing to generate supports of topics, and (iii) identifying relatively interpretable spanning clusters (i.e., textual factors) through topic modeling. Our data-driven approach captures complex linguistic structures while ensuring computational scalability and economic interpretability, plausibly attaining certain advantages over and complementing other unstructured data analytics used by researchers, including emergent large language models. We conduct initial validation tests of the framework and discuss three types of its applications: (i) enhancing prediction and inference with texts, (ii) interpreting (non-text-based) models, and (iii) constructing new text-based metrics and explanatory variables. We illustrate each of these applications using examples in finance and economics such as macroeconomic forecasting from news articles, interpreting multi-factor asset pricing models from corporate filings, and measuring theme-based technology breakthroughs from patents. Finally, we provide a flexible statistical package of textual factors for online distribution to facilitate future research and applications.

JEL-codes: C13 (search for similar items in EconPapers)
Date: 2024-11
New Economics Papers: this item is included in nep-big and nep-ecm
Note: AP CF TWP
References: Add references at CitEc
Citations:

Downloads: (external link)
http://www.nber.org/papers/w33168.pdf (application/pdf)
Access to the full text is generally limited to series subscribers, however if the top level domain of the client browser is in a developing country or transition economy free access is provided. More information about subscriptions and free access is available at http://www.nber.org/wwphelp.html. Free access is also available to older working papers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nbr:nberwo:33168

Ordering information: This working paper can be ordered from
http://www.nber.org/papers/w33168
The price is Paper copy available by mail.

Access Statistics for this paper

More papers in NBER Working Papers from National Bureau of Economic Research, Inc National Bureau of Economic Research, 1050 Massachusetts Avenue Cambridge, MA 02138, U.S.A.. Contact information at EDIRC.
Bibliographic data for series maintained by ().

 
Page updated 2025-03-19
Handle: RePEc:nbr:nberwo:33168