Identifying Consumer Preferences from User- and Crowd-Generated Digital Footprints on Amazon.com by Leveraging Machine Learning and Natural Language Processing

Jeong, Jikhan

Identifying Consumer Preferences from User- and Crowd-Generated Digital Footprints on Amazon.com by Leveraging Machine Learning and Natural Language Processing

2020 Papers from Job Market Papers

Abstract: Inexperienced consumers may have high uncertainty about experience goods that require technical knowledge and skills to operate effectively; therefore, experienced consumers' prior reviews can be useful for inexperienced ones. However, the one-sided review system (e.g., Amazon.com) only provides the opportunity for consumers to write a review as a buyer and contains no feedback from the seller's side, so the information displayed about individual buyers is limited. This study analyzes consumers' digital footprints (DFs) to identify and predict unobserved consumer preferences from online product reviews. It makes use of Python coding along with high-performance computing to extract reviewers' DFs for a specific product group (programmable thermostats) from a dataset of 141 million Amazon reviews. It identifies consumers' sentiment toward product content dimensions (PCDs) extracted from review text by applying topic modeling and domain expert annotations. However, some questionable reviews (posted by 'suspicious one-time reviewers' and 'always-the-same rating reviewers') are excluded. This paper obtains three main results: First, I find that the factors that affect consumer ratings are: (a) users' DFs (e.g., length of the product review, average rating across all categories, volume of prior reviews overall and in sub-categories), (b) reviewers' attitudes toward eight product content dimensions (smart connectivity, easiness, energy saving, functionality, support, price value, privacy, and the Amazon effect), and (c) other prior reviewers DFs (e.g., length of the review summary.) All the heteroskedastic ordered probit models with DF and sentiment variables show a better model fit than the base model. This paper is the first to identify the effect of service quality of the online platform (Amazon.com) on ratings. Second, extreme gradient boosting (XGBoost) is found to obtain the highest F1 score for predicting the ratings of potential consumers before they make a purchase or write a review. All the models containing DF and sentiment variables show a higher prediction performance than the base model. Classifications with a lower range of labels (three-class or binary classifications) show better prediction performance than the five-star rating classification. However, the performance for the minority class is low. Third, a convolutional neural network (CNN) on top of Bidirectional Encoder Representations from Transformers (BERT) embedding shows the highest F1 score for classifying consumers' sentiment toward a specific PCD. Overall, this approach developed in this paper is applicable, scalable, and interpretable for distinguishing important drivers of consumer reviews for different goods in a specific industry and can be used by industry to identify and predict unobserved consumer preferences and sentiment associated with product content dimensions.

JEL-codes: C45 D80 M21 M31 (search for similar items in EconPapers)
Date: 2020-11-10
New Economics Papers: this item is included in nep-big, nep-cmp, nep-ict, nep-mkt and nep-pay
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://ideas.repec.org/jmp/2020/pje208.pdf

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:jmp:jm2020:pje208

Access Statistics for this paper

More papers in 2020 Papers from Job Market Papers
Bibliographic data for series maintained by RePEc Team ().