A Structured Analysis of Unstructured Big Data by Leveraging Cloud Computing
Xiao Liu (),
Param Vir Singh () and
Kannan Srinivasan ()
Additional contact information
Xiao Liu: Stern School of Business, New York University, New York, New York 10012
Param Vir Singh: Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
Kannan Srinivasan: Tepper School of Business, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
Marketing Science, 2016, vol. 35, issue 3, 363-388
Abstract:
Accurate forecasting of sales/consumption is particularly important for marketing because this information can be used to adjust marketing budget allocations and overall marketing strategies. Recently, online social platforms have produced an unparalleled amount of data on consumer behavior. However, two challenges have limited the use of these data in obtaining meaningful business marketing insights. First, the data are typically in an unstructured format, such as texts, images, audio, and video. Second, the sheer volume of the data makes standard analysis procedures computationally unworkable. In this study, we combine methods from cloud computing, machine learning, and text mining to illustrate how online platform content, such as Twitter, can be effectively used for forecasting. We conduct our analysis on a significant volume of nearly two billion Tweets and 400 billion Wikipedia pages. Our main findings emphasize that, by contrast to basic surface-level measures such as the volume of or sentiments in Tweets, the information content of Tweets and their timeliness significantly improve forecasting accuracy. Our method endogenously summarizes the information in Tweets. The advantage of our method is that the classification of the Tweets is based on what is in the Tweets rather than preconceived topics that may not be relevant. We also find that, by contrast to Twitter, other online data (e.g., Google Trends, Wikipedia views, IMDB reviews, and Huffington Post news) are very weak predictors of TV show demand because users tweet about TV shows before, during, and after a TV show, whereas Google searches, Wikipedia views, IMDB reviews, and news posts typically lag behind the show.Data, as supplemental material, are available at http://dx.doi.org/10.1287/mksc.2015.0972 .
Keywords: big data; cloud computing; text mining; user generated content; Twitter; Google Trends (search for similar items in EconPapers)
Date: 2016
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (43)
Downloads: (external link)
http://dx.doi.org/10.1287/mksc.2015.0972 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:inm:ormksc:v:35:y:2016:i:3:p:363-388
Access Statistics for this article
More articles in Marketing Science from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().