EconPapers    
Economics at your fingertips  
 

Data leakage inflates prediction performance in connectome-based machine learning models

Matthew Rosenblatt (), Link Tejavibulya, Rongtao Jiang, Stephanie Noble and Dustin Scheinost
Additional contact information
Matthew Rosenblatt: Yale University
Link Tejavibulya: Yale University
Rongtao Jiang: Yale School of Medicine
Stephanie Noble: Yale School of Medicine
Dustin Scheinost: Yale University

Nature Communications, 2024, vol. 15, issue 1, 1-15

Abstract: Abstract Predictive modeling is a central technique in neuroimaging to identify brain-behavior relationships and test their generalizability to unseen data. However, data leakage undermines the validity of predictive models by breaching the separation between training and test data. Leakage is always an incorrect practice but still pervasive in machine learning. Understanding its effects on neuroimaging predictive models can inform how leakage affects existing literature. Here, we investigate the effects of five forms of leakage–involving feature selection, covariate correction, and dependence between subjects–on functional and structural connectome-based machine learning models across four datasets and three phenotypes. Leakage via feature selection and repeated subjects drastically inflates prediction performance, whereas other forms of leakage have minor effects. Furthermore, small datasets exacerbate the effects of leakage. Overall, our results illustrate the variable effects of leakage and underscore the importance of avoiding data leakage to improve the validity and reproducibility of predictive modeling.

Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (4)

Downloads: (external link)
https://www.nature.com/articles/s41467-024-46150-w Abstract (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:15:y:2024:i:1:d:10.1038_s41467-024-46150-w

Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/

DOI: 10.1038/s41467-024-46150-w

Access Statistics for this article

Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie

More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-19
Handle: RePEc:nat:natcom:v:15:y:2024:i:1:d:10.1038_s41467-024-46150-w