A data driven approach to handling missing data in the UK Millennium Cohort Study
Martina Kaja Narayanan,
Aase Villadsen,
Michail Katsoulis,
Brian Dodgeon,
George Ploubidis,
Emla Fitzsimons and
Richard J. Silverwood
No agqmu_v1, SocArXiv from Center for Open Science
Abstract:
Missing data arising from sweep non-response is a major challenge in longitudinal cohort studies, threatening statistical power and the validity of inferences. In the UK Millennium Cohort Study (MCS), non-response has increased substantially from sweep 1 (9 months old) to sweep 7 (17 years old), underscoring the need for robust strategies to handle non-response. We applied a systematic, data-driven approach to identify predictors of non-response at each sweep of the MCS, drawing on all available survey data at the time of analysis. The strongest and most consistent predictor of non-response was prior sweep non-response. Additional robust predictors included lower parental occupational social class, parental non-participation in the latest general elections, parent not being in paid work, higher cohort member’s age and lower cognitive test scores. We then evaluated whether incorporating the identified predictors of non-response as auxiliary variables in multiple imputation (MI) or as covariates in inverse probability weighting (IPW) improved sample representativeness. Validation analyses, using both external benchmarks (2021 Census) and internal comparisons to known early-life distributions, showed that MI and IPW models including the identified predictors substantially reduced or eliminated bias in key variables such as housing tenure and parental social class. Our findings demonstrate that the use of systematically identified auxiliary variables can improve the validity of inferences drawn from the MCS. The resulting predictor set offers a practical resource for applied researchers using MCS data and provides a replicable framework for addressing sweep non-response in other longitudinal studies.
Date: 2026-02-14
References: Add references at CitEc
Citations:
Downloads: (external link)
https://osf.io/download/698f542f3fab057afbdfc4f1/
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:osf:socarx:agqmu_v1
DOI: 10.31219/osf.io/agqmu_v1
Access Statistics for this paper
More papers in SocArXiv from Center for Open Science
Bibliographic data for series maintained by OSF ().