Online Resource Allocation with Personalized Learning

Zhalechian, Mohammad; Keyvanshokooh, Esmaeil; Shi, Cong; Van Oyen, Mark P.

Online Resource Allocation with Personalized Learning

Mohammad Zhalechian (), Esmaeil Keyvanshokooh (), Cong Shi () and Mark P. Van Oyen ()
Additional contact information
Mohammad Zhalechian: Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, Michigan 48109
Esmaeil Keyvanshokooh: Department of Information and Operations Management, Mayes Business School, Texas A&M University, College Station, Texas 77845
Cong Shi: Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, Michigan 48109
Mark P. Van Oyen: Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, Michigan 48109

Operations Research, 2022, vol. 70, issue 4, 2138-2161

Abstract: Joint online learning and resource allocation is a fundamental problem inherent in many applications. In a general setting, heterogeneous customers arrive sequentially, each of which can be allocated to a resource in an online fashion. Customers stochastically consume the resources, allocations yield stochastic rewards, and the system receives feedback outcomes with delay. We introduce a generic framework that judiciously synergizes online learning with a broad class of online resource allocation mechanisms, where the sequence of customer contexts is adversarial, and the customer reward and the resource consumption are stochastic and unknown. First, we propose an online algorithm for a general resource allocation problem, called personalized resource allocation while learning with delay, which strikes a three-way balance between exploration, exploitation, and hedging against adversarial arrival sequence. We provide a performance guarantee for this online algorithm in terms of Bayesian regret. Next, we develop our second online algorithm for an advance scheduling problem, called personalized advance scheduling while learning with delay (PAS-LD), and evaluate its theoretical performance. The PAS-LD algorithm has a more delicate structure and offers multiday scheduling while accounting for the no-show behavior of customers. We demonstrate the practicality and efficacy of our PAS-LD algorithm using clinical data from a partner health system. Our results show that the proposed algorithm provides promising results compared with several benchmark policies.

Keywords: Operations and Supply Chains; online learning; online resource allocation; contextual bandit; regret analysis; advance scheduling; personalized healthcare services (search for similar items in EconPapers)
Date: 2022
References: Add references at CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
http://dx.doi.org/10.1287/opre.2022.2294 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:oropre:v:70:y:2022:i:4:p:2138-2161

Access Statistics for this article

More articles in Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().