Quantile-Optimal Policy Learning under Unmeasured Confounding

Chen, Zhongren; Chen, Siyu; Qi, Zhengling; Chen, Xiaohong; Yang, Zhuoran

Quantile-Optimal Policy Learning under Unmeasured Confounding

Zhongren Chen, Siyu Chen, Zhengling Qi, Xiaohong Chen and Zhuoran Yang
Additional contact information
Zhongren Chen: Yale University
Siyu Chen: Yale University
Zhengling Qi: George Washington University
Xiaohong Chen: Yale University
Zhuoran Yang: Yale University

No 2469, Cowles Foundation Discussion Papers from Cowles Foundation for Research in Economics, Yale University

Abstract: We study quantile-optimal policy learning where the goal is to find a policy whose reward distribution has the largest a-quantile for some a P p0, 1q. We focus on the offline setting whose generating process involves unobserved confounders. Such a problem suffers from three main challenges: (i) nonlinearity of the quantile objective as a functional of the reward distribution, (ii) unobserved confounding issue, and (iii) insufficient coverage of the offline dataset. To address these challenges, we propose a suite of causal-assisted policy learning methods that provably enjoy strong theoretical guarantees under mild conditions. In particular, to address (i) and (ii), using causal inference tools such as instrumental variables and negative controls, we propose to estimate the quantile objectives by solving nonlinear functional integral equations. Then we adopt a minimax estimation approach with nonparametric models to solve these integral equations, and propose to construct conservative policy estimates that address (iii). The final policy is the one that maximizes these pessimistic estimates. In addition, we propose a novel regularized policy learning method that is more amenable to computation. Finally, we prove that the policies learned by these methods are O(n-1/2) quantile-optimal under a mild coverage assumption on the offline dataset. Here, O(.) omits poly-logarithmic factors. To the best of our knowledge, we propose the first sample-efficient policy learning algorithms for estimating the quantile-optimal policy when there exist unmeasured confounding.

Pages: 67 pages
Date: 2025-06-08
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://cowles.yale.edu/sites/default/files/2025-10/d2469.pdf (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:cwl:cwldpp:2469

Ordering information: This working paper can be ordered from
Cowles Foundation, Yale University, Box 208281, New Haven, CT 06520-8281 USA
The price is None.

Access Statistics for this paper

More papers in Cowles Foundation Discussion Papers from Cowles Foundation for Research in Economics, Yale University Yale University, Box 208281, New Haven, CT 06520-8281 USA. Contact information at EDIRC.
Bibliographic data for series maintained by Brittany Ladd ().