Enhancing Propensity Score Analysis with data Missing Not at Random: Introducing Dual-Forest Proximity Imputation
Yongseok Lee and
Walter Leite
Additional contact information
Yongseok Lee: University of Florida
No ex8ad_v1, OSF Preprints from Center for Open Science
Abstract:
Researchers using propensity score analysis (PSA) to estimate treatment effects using secondary data may have to handle data that is missing not at random (MNAR). Existing methods for PSA with MNAR data use logistic regression to model the missing data mechanisms, thus requiring manual specification of functional forms, and are difficult to implement with a large number of covariates. To overcome these limitations, this study proposes alternatives to existing methods by replacing logistic regression with a random forest. Also, it introduces the Dual-Forest Proximity imputation method, which leverages two types of proximity matrices of random forest techniques and incorporates missing pattern information in each matrix. Results from a Monte Carlo simulation show Dual-Forest Proximity imputation’s enhanced bias reduction with various types of MNAR mechanisms as compared to existing and alternative methods. A case study is also provided using data from the National Longitudinal Survey of Youth 1979 (NLSY79).
Date: 2025-07-07
New Economics Papers: this item is included in nep-ecm
References: Add references at CitEc
Citations:
Downloads: (external link)
https://osf.io/download/686c8a04c129c534e103ac37/
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:osf:osfxxx:ex8ad_v1
DOI: 10.31219/osf.io/ex8ad_v1
Access Statistics for this paper
More papers in OSF Preprints from Center for Open Science
Bibliographic data for series maintained by OSF ().