Technical Note—Online Matching with Bayesian Rewards

Simchi-Levi, David; Sun, Rui; Wang, Xinshang

Technical Note—Online Matching with Bayesian Rewards

David Simchi-Levi (), Rui Sun () and Xinshang Wang ()
Additional contact information
David Simchi-Levi: Institute for Data, Systems, and Society, Department of Civil & Environmental Engineering, and Operations Research Center, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
Rui Sun: Institute for Data, Systems, and Society, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
Xinshang Wang: Alibaba Group US, San Mateo, California 94402; and Antai College of Economics and Management, Shanghai Jiao Tong University, Shanghai 200240, China

Operations Research, 2025, vol. 73, issue 1, 278-289

Abstract: We study in this paper an online matching problem where a central platform needs to match a number of limited resources to different groups of users that arrive sequentially over time. The reward of each matching option depends on both the type of resource and the time period the user arrives. The matching rewards are assumed to be unknown but drawn from probability distributions that are known a priori. The platform then needs to learn the true rewards online based on real-time observations of the matching results. The goal of the central platform is to maximize the total reward from all of the matchings without violating the resource capacity constraints. We formulate this matching problem with Bayesian rewards as a Markovian multiarmed bandit problem with budget constraints, where each arm corresponds to a pair of a resources and a time period. We devise our algorithm by first finding policies for each single arm separately via a relaxed linear program and then “assembling” these policies together through judicious selection criteria and well-designed pulling orders. We prove that the expected reward of our algorithm is at least 1 2 ( 2 − 1 ) of the expected reward of an optimal algorithm.

Keywords: Optimization; online matching; Bayesian learning; Markovian bandits; approximation algorithm (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
http://dx.doi.org/10.1287/opre.2021.0499 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:oropre:v:73:y:2025:i:1:p:278-289

Access Statistics for this article

More articles in Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().