EconPapers    
Economics at your fingertips  
 

Sequential Text-Term Selection in Vector Space Models

Feifei Wang, Jingyuan Liu and Hansheng Wang

Journal of Business & Economic Statistics, 2021, vol. 39, issue 1, 82-97

Abstract: Text mining has recently attracted a great deal of attention with the accumulation of text documents in all fields. In this article, we focus on the use of textual information to explain continuous variables in the framework of linear regressions. To handle the unstructured texts, one common practice is to structuralize the text documents via vector space models. However, using words or phrases as the basic analysis terms in vector space models is in high debate. In addition, vector space models often lead to an extremely large term set and suffer from the curse of dimensionality, which makes term selection important and necessary. Toward this end, we propose a novel term screening method for vector space models under a linear regression setup. We first split the entire term space into different subspaces according to the length of terms and then conduct term screening in a sequential manner. We prove the screening consistency of the method and assess the empirical performance of the proposed method with simulations based on a dataset of online consumer reviews for cellphones. Then, we analyze the associated real data. The results show that the sequential term selection technique can effectively detect the relevant terms by a few steps.

Date: 2021
References: Add references at CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
http://hdl.handle.net/10.1080/07350015.2019.1634079 (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:taf:jnlbes:v:39:y:2021:i:1:p:82-97

Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/UBES20

DOI: 10.1080/07350015.2019.1634079

Access Statistics for this article

Journal of Business & Economic Statistics is currently edited by Eric Sampson, Rong Chen and Shakeeb Khan

More articles in Journal of Business & Economic Statistics from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().

 
Page updated 2025-03-20
Handle: RePEc:taf:jnlbes:v:39:y:2021:i:1:p:82-97