Power and reproducibility in the external validation of brain-phenotype predictions
Matthew Rosenblatt (),
Link Tejavibulya,
Huili Sun,
Chris C. Camp,
Milana Khaitova,
Brendan D. Adkinson,
Rongtao Jiang,
Margaret L. Westwater,
Stephanie Noble and
Dustin Scheinost
Additional contact information
Matthew Rosenblatt: Yale University
Link Tejavibulya: Yale University
Huili Sun: Yale University
Chris C. Camp: Yale University
Milana Khaitova: Yale School of Medicine
Brendan D. Adkinson: Yale University
Rongtao Jiang: Yale School of Medicine
Margaret L. Westwater: Yale School of Medicine
Stephanie Noble: Yale School of Medicine
Dustin Scheinost: Yale University
Nature Human Behaviour, 2024, vol. 8, issue 10, 2018-2033
Abstract:
Abstract Brain-phenotype predictive models seek to identify reproducible and generalizable brain-phenotype associations. External validation, or the evaluation of a model in external datasets, is the gold standard in evaluating the generalizability of models in neuroimaging. Unlike typical studies, external validation involves two sample sizes: the training and the external sample sizes. Thus, traditional power calculations may not be appropriate. Here we ran over 900 million resampling-based simulations in functional and structural connectivity data to investigate the relationship between training sample size, external sample size, phenotype effect size, theoretical power and simulated power. Our analysis included a wide range of datasets: the Healthy Brain Network, the Adolescent Brain Cognitive Development Study, the Human Connectome Project (Development and Young Adult), the Philadelphia Neurodevelopmental Cohort, the Queensland Twin Adolescent Brain Project, and the Chinese Human Connectome Project; and phenotypes: age, body mass index, matrix reasoning, working memory, attention problems, anxiety/depression symptoms and relational processing. High effect size predictions achieved adequate power with training and external sample sizes of a few hundred individuals, whereas low and medium effect size predictions required hundreds to thousands of training and external samples. In addition, most previous external validation studies used sample sizes prone to low power, and theoretical power curves should be adjusted for the training sample size. Furthermore, model performance in internal validation often informed subsequent external validation performance (Pearson’s r difference
Date: 2024
References: Add references at CitEc
Citations:
Downloads: (external link)
https://www.nature.com/articles/s41562-024-01931-7 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:nat:nathum:v:8:y:2024:i:10:d:10.1038_s41562-024-01931-7
Ordering information: This journal article can be ordered from
https://www.nature.com/nathumbehav/
DOI: 10.1038/s41562-024-01931-7
Access Statistics for this article
Nature Human Behaviour is currently edited by Stavroula Kousta
More articles in Nature Human Behaviour from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().