Cross-Fitting and Averaging for Machine Learning Estimation of Heterogeneous Treatment Effects
Daniel Jacob
No 2020-014, IRTG 1792 Discussion Papers from Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series"
Abstract:
We investigate the finite sample performance of sample splitting, cross-fitting and averaging for the estimation of the conditional average treatment effect. Recently proposed methods, so-called meta- learners, make use of machine learning to estimate different nuisance functions and hence allow for fewer restrictions on the underlying structure of the data. To limit a potential overfitting bias that may result when using machine learning methods, cross- fitting estimators have been proposed. This includes the splitting of the data in different folds to reduce bias and averaging over folds to restore efficiency. To the best of our knowledge, it is not yet clear how exactly the data should be split and averaged. We employ a Monte Carlo study with different data generation processes and consider twelve different estimators that vary in sample-splitting, cross-fitting and averaging procedures. We investigate the performance of each estimator independently on four different meta-learners: the doubly-robust-learner, R-learner, T-learner and X-learner. We find that the performance of all meta-learners heavily depends on the procedure of splitting and averaging. The best performance in terms of mean squared error (MSE) among the sample split estimators can be achieved when applying cross-fitting plus taking the median over multiple different sample-splitting iterations. Some meta-learners exhibit a high variance when the lasso is included in the ML methods. Excluding the lasso decreases the variance and leads to robust and at least competitive results.
Keywords: causal inference; sample splitting; cross-fitting; sample averaging; machine learning; simulation study (search for similar items in EconPapers)
JEL-codes: C01 C14 C31 C63 (search for similar items in EconPapers)
Date: 2020
New Economics Papers: this item is included in nep-big, nep-cmp, nep-ecm and nep-ore
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (6)
Downloads: (external link)
https://www.econstor.eu/bitstream/10419/230820/1/irtg1792dp2020-014.pdf (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:zbw:irtgdp:2020014
Access Statistics for this paper
More papers in IRTG 1792 Discussion Papers from Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series" Contact information at EDIRC.
Bibliographic data for series maintained by ZBW - Leibniz Information Centre for Economics ().