Model Selection Using Database Characteristics: Developing a Classification Tree for Longitudinal Incidence Data

Schwartz, Eric M.; Bradlow, Eric T.; Fader, Peter S.

Model Selection Using Database Characteristics: Developing a Classification Tree for Longitudinal Incidence Data

Eric M. Schwartz (), Eric T. Bradlow () and Peter S. Fader ()
Additional contact information
Eric M. Schwartz: Stephen M. Ross School of Business, University of Michigan, Ann Arbor, Michigan 48109
Eric T. Bradlow: The Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania 19104
Peter S. Fader: The Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania 19104

Marketing Science, 2014, vol. 33, issue 2, 188-205

Abstract: When managers and researchers encounter a data set, they typically ask two key questions: (1) Which model (from a candidate set) should I use? And (2) if I use a particular model, when is it going to likely work well for my business goal? This research addresses those two questions and provides a rule, i.e., a decision tree, for data analysts to portend the “winning model” before having to fit any of them for longitudinal incidence data. We characterize data sets based on managerially relevant (and easy-to-compute) summary statistics, and we use classification techniques from machine learning to provide a decision tree that recommends when to use which model. By doing the “legwork” of obtaining this decision tree for model selection, we provide a time-saving tool to analysts. We illustrate this method for a common marketing problem (i.e., forecasting repeat purchasing incidence for a cohort of new customers) and demonstrate the method's ability to discriminate among an integrated family of a hidden Markov model (HMM) and its constrained variants. We observe a strong ability for data set characteristics to guide the choice of the most appropriate model, and we observe that some model features (e.g., the “back-and-forth” migration between latent states) are more important to accommodate than are others (e.g., the inclusion of an “off” state with no activity). We also demonstrate the method's broad potential by providing a general “recipe” for researchers to replicate this kind of model classification task in other managerial contexts (outside of repeat purchasing incidence data and the HMM framework).

Keywords: model selection; machine learning; data science; business intelligence; hidden Markov models; classification tree; random forest; posterior predictive model checking; hierarchical Bayesian methods; forecasting (search for similar items in EconPapers)
Date: 2014
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (21)

Downloads: (external link)
http://dx.doi.org/10.1287/mksc.2013.0825 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:ormksc:v:33:y:2014:i:2:p:188-205

Access Statistics for this article

More articles in Marketing Science from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().