The Composite Overfit Analysis Framework: Assessing the Out-of-Sample Generalizability of Construct-Based Models Using Predictive Deviance, Deviance Trees, and Unstable Paths

Danks, Nicholas P.; Ray, Soumya; Shmueli, Galit

The Composite Overfit Analysis Framework: Assessing the Out-of-Sample Generalizability of Construct-Based Models Using Predictive Deviance, Deviance Trees, and Unstable Paths

Nicholas P. Danks (), Soumya Ray () and Galit Shmueli ()
Additional contact information
Nicholas P. Danks: Trinity Business School, Trinity College, D02PN40 Dublin, Ireland
Soumya Ray: Institute of Service Science, College of Technology Management, National Tsing Hua University, Taiwan 30013, Republic of China
Galit Shmueli: Institute of Service Science, College of Technology Management, National Tsing Hua University, Taiwan 30013, Republic of China

Management Science, 2024, vol. 70, issue 1, 647-669

Abstract: Construct-based models have become a mainstay of management and information systems research. However, these models are likely overfit to the data samples upon which they are estimated, making them risky to use in explanatory, prescriptive, or predictive ways outside a given sample. Empirical researchers currently lack tools to analyze why and how their models may not generalize out of sample. We propose a composite overfit analysis (COA) framework that applies predictive tools to describe the sources and ramifications of overfit in terms of the focal concepts important to empirical researchers: cases, constructs, and causal paths. The COA framework begins by using a leave-one-out crossvalidation procedure to identify cases with unusually high predictive error given their in-sample fit—a difference we describe as predictive deviance . The framework then employs a novel deviance tree method to group deviant cases that have similar predictive deviance and for similar theoretical reasons. We then employ a leave-deviant-group-out method, which sequentially analyzes how each deviant group affects model parameters, thereby identifying potentially unstable paths in the model. We can then infer descriptive reasons for why and how overfit affects a given model and data sample using the grouping criteria of the deviance tree, construct scores of deviant groups, and resulting unstable paths. These insights allow researchers to identify unexpected behavior that could define boundary conditions of their theory or point to new theoretical phenomena. We demonstrate the practical utility of our analytical framework on a technology adoption model in a new context.

Keywords: prediction; model fit; overfit; constructs; out-of-sample generalizability (search for similar items in EconPapers)
Date: 2024
References: Add references at CitEc
Citations:

Downloads: (external link)
http://dx.doi.org/10.1287/mnsc.2023.4705 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:ormnsc:v:70:y:2024:i:1:p:647-669

Access Statistics for this article

More articles in Management Science from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().