EconPapers    
Economics at your fingertips  
 

Dynamically optimal treatment allocation using Reinforcement Learning

Karun Adusumilli, Friedrich Geiecke and Claudio Schilter

Papers from arXiv.org

Abstract: Devising guidance on how to assign individuals to treatment is an important goal of empirical research. In practice individuals often arrive sequentially, and the planner faces various constraints such as limited budget/capacity, or borrowing constraints, or the need to place people in a queue. For instance, a governmental body may receive a budget outlay at the beginning of an year, and it may need to decide how best to allocate resources within the year to individuals who arrive sequentially. In this and other examples involving inter-temporal trade-offs, previous work on devising optimal policy rules in a static context is either not applicable, or is sub-optimal. Here we show how one can use offline observational data to estimate an optimal policy rule that maximizes ex-ante expected welfare in this dynamic context. We allow the class of policy rules to be restricted for computational, legal or incentive compatibility reasons. The problem is equivalent to one of optimal control under a constrained policy class, and we exploit recent developments in Reinforcement Learning (RL) to propose an algorithm to solve this. The algorithm is easily implementable and computationally efficient, with speedups achieved through multiple RL agents learning in parallel processes. We also characterize the statistical regret from using our estimated policy rule. To do this, we show that a Partial Differential Equation (PDE) characterizes the evolution of the value function under each policy. The data enables us to obtain a sample version of the PDE that provides estimates of these value functions. The estimated policy rule is the one with the maximal estimated value function. Using the theory of viscosity solutions to PDEs we show that the policy regret decays at a $n^{-1/2}$ rate in most examples; this is the same rate as that obtained in the static case.

New Economics Papers: this item is included in nep-cmp
Date: 2019-04, Revised 2019-07
References: View references in EconPapers View complete reference list from CitEc
Citations: Track citations by RSS feed

Downloads: (external link)
http://arxiv.org/pdf/1904.01047 Latest version (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:arx:papers:1904.01047

Access Statistics for this paper

More papers in Papers from arXiv.org
Bibliographic data for series maintained by arXiv administrators ().

 
Page updated 2019-07-31
Handle: RePEc:arx:papers:1904.01047