Topic Modeling for Analyzing Open-Ended Survey Responses
Andra-Selina Pietsch and
Stefan Lessmann
No 2018-054, IRTG 1792 Discussion Papers from Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series"
Abstract:
Open-ended responses are widely used in market research studies. Processing of such responses requires labor-intensive human coding. This paper focuses on unsupervised topic models and tests their ability to automate the analysis of open-ended responses. Since state-of-the-art topic models struggle with the shortness of open-ended responses, the paper considers three novel short text topic models: Latent Feature Latent Dirichlet Allocation, Biterm Topic Model and Word Network Topic Model. The models are fitted and evaluated on a set of realworld open-ended responses provided by a market research company. Multiple components such as topic coherence and document classification are quantitatively and qualitatively evaluated to appraise whether topic models can replace human coding. The results suggest that topic models are a viable alternative for open-ended response coding. However, their usefulness is limited when a correct one-to-one mapping of responses and topics or the exact topic distribution is needed.
Keywords: Market research; open-ended responses; text analytics; short text topic models (search for similar items in EconPapers)
JEL-codes: C00 (search for similar items in EconPapers)
Date: 2018
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (8)
Downloads: (external link)
https://www.econstor.eu/bitstream/10419/230765/1/irtg1792dp2018-054.pdf (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:zbw:irtgdp:2018054
Access Statistics for this paper
More papers in IRTG 1792 Discussion Papers from Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series" Contact information at EDIRC.
Bibliographic data for series maintained by ZBW - Leibniz Information Centre for Economics ().