EconPapers    
Economics at your fingertips  
 

Enhancing Propensity Score Analysis with data Missing Not at Random: Introducing Dual-Forest Proximity Imputation

Yongseok Lee and Walter Leite
Additional contact information
Yongseok Lee: University of Florida

No ex8ad_v1, OSF Preprints from Center for Open Science

Abstract: Researchers using propensity score analysis (PSA) to estimate treatment effects using secondary data may have to handle data that is missing not at random (MNAR). Existing methods for PSA with MNAR data use logistic regression to model the missing data mechanisms, thus requiring manual specification of functional forms, and are difficult to implement with a large number of covariates. To overcome these limitations, this study proposes alternatives to existing methods by replacing logistic regression with a random forest. Also, it introduces the Dual-Forest Proximity imputation method, which leverages two types of proximity matrices of random forest techniques and incorporates missing pattern information in each matrix. Results from a Monte Carlo simulation show Dual-Forest Proximity imputation’s enhanced bias reduction with various types of MNAR mechanisms as compared to existing and alternative methods. A case study is also provided using data from the National Longitudinal Survey of Youth 1979 (NLSY79).

Date: 2025-07-07
New Economics Papers: this item is included in nep-ecm
References: Add references at CitEc
Citations:

Downloads: (external link)
https://osf.io/download/686c8a04c129c534e103ac37/

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:osf:osfxxx:ex8ad_v1

DOI: 10.31219/osf.io/ex8ad_v1

Access Statistics for this paper

More papers in OSF Preprints from Center for Open Science
Bibliographic data for series maintained by OSF ().

 
Page updated 2025-08-20
Handle: RePEc:osf:osfxxx:ex8ad_v1