Some models are useful, but for how long?: A decision theoretic approach to choosing when to refit large-scale prediction models
Kentaro Hoffman,
Stephen Salerno,
Jeff Leek and
Tyler McCormick
Papers from arXiv.org
Abstract:
Large-scale prediction models using tools from artificial intelligence (AI) or machine learning (ML) are increasingly common across a variety of industries and scientific domains. Despite their effectiveness, training AI and ML tools at scale can cost tens or hundreds of thousands of dollars (or more); and even after a model is trained, substantial resources must be invested to keep models up-to-date. This paper presents a decision-theoretic framework for deciding when to refit an AI/ML model when the goal is to perform unbiased statistical inference using partially AI/ML-generated data. Drawing on portfolio optimization theory, we treat the decision of {\it recalibrating} a model or statistical inference versus {\it refitting} the model as a choice between ``investing'' in one of two ``assets.'' One asset, recalibrating the model based on another model, is quick and relatively inexpensive but bears uncertainty from sampling and may not be robust to model drift. The other asset, {\it refitting} the model, is costly but removes the drift concern (though not statistical uncertainty from sampling). We present a framework for balancing these two potential investments while preserving statistical validity. We evaluate the framework using simulation and data on electricity usage and predicting flu trends.
Date: 2024-05, Revised 2025-01
New Economics Papers: this item is included in nep-ain, nep-big and nep-cmp
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://arxiv.org/pdf/2405.13926 Latest version (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:arx:papers:2405.13926
Access Statistics for this paper
More papers in Papers from arXiv.org
Bibliographic data for series maintained by arXiv administrators ().