EconPapers    
Economics at your fingertips  
 

On the (Mis)Use of Machine Learning with Panel Data

Augusto Cerqua, Marco Letta and Gabriele Pinto

Papers from arXiv.org

Abstract: Machine Learning (ML) is increasingly employed to inform and support policymaking interventions. This methodological article cautions practitioners about common but often overlooked pitfalls associated with the uncritical application of supervised ML algorithms to panel data. Ignoring the cross-sectional and longitudinal structure of this data can lead to hard-to-detect data leakage, inflated out-of-sample performance, and an inadvertent overestimation of the real-world usefulness and applicability of ML models. After clarifying these issues, we provide practical guidelines and best practices for applied researchers to ensure the correct implementation of supervised ML in panel data environments, emphasizing the need to define ex ante the primary goal of the analysis and align the ML pipeline accordingly. An empirical application based on over 3,000 US counties from 2000 to 2019 illustrates the practical relevance of these points across nearly 500 models for both classification and regression tasks.

Date: 2024-11
New Economics Papers: this item is included in nep-big, nep-cmp, nep-ecm and nep-pke
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://arxiv.org/pdf/2411.09218 Latest version (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:arx:papers:2411.09218

Access Statistics for this paper

More papers in Papers from arXiv.org
Bibliographic data for series maintained by arXiv administrators ().

 
Page updated 2025-03-19
Handle: RePEc:arx:papers:2411.09218