Double Machine Learning and Automated Confounder Selection -- A Cautionary Tale
Paul H\"unermund,
Beyers Louw and
Itamar Caspi
Additional contact information
Paul H\"unermund: Copenhagen Business School
Beyers Louw: Maastricht University
Authors registered in the RePEc Author Service: Paul Hünermund
Papers from arXiv.org
Abstract:
Double machine learning (DML) has become an increasingly popular tool for automated variable selection in high-dimensional settings. Even though the ability to deal with a large number of potential covariates can render selection-on-observables assumptions more plausible, there is at the same time a growing risk that endogenous variables are included, which would lead to the violation of conditional independence. This paper demonstrates that DML is very sensitive to the inclusion of only a few "bad controls" in the covariate space. The resulting bias varies with the nature of the theoretical causal model, which raises concerns about the feasibility of selecting control variables in a data-driven way.
Date: 2021-08, Revised 2023-05
New Economics Papers: this item is included in nep-big, nep-cmp, nep-ecm and nep-isf
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (3)
Downloads: (external link)
http://arxiv.org/pdf/2108.11294 Latest version (application/pdf)
Related works:
Journal Article: Double machine learning and automated confounder selection: A cautionary tale (2023) 
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:arx:papers:2108.11294
Access Statistics for this paper
More papers in Papers from arXiv.org
Bibliographic data for series maintained by arXiv administrators (help@arxiv.org).