Matching While Learning

Johari, Ramesh; Kamble, Vijay; Kanoria, Yash

Matching While Learning

Ramesh Johari (), Vijay Kamble () and Yash Kanoria ()
Additional contact information
Ramesh Johari: Department of Management Science and Engineering, Stanford University, Stanford, California 94305;
Vijay Kamble: Department of Information and Decision Sciences, University of Illinois at Chicago, Chicago, Illinois 60607;
Yash Kanoria: Columbia Business School, New York, New York 10027

Operations Research, 2021, vol. 69, issue 2, 655-681

Abstract: We consider the problem faced by a service platform that needs to match limited supply with demand while learning the attributes of new users to match them better in the future. We introduce a benchmark model with heterogeneous workers (demand) and a limited supply of jobs that arrive over time. Job types are known to the platform, but worker types are unknown and must be learned by observing match outcomes. Workers depart after performing a certain number of jobs. The expected payoff from a match depends on the pair of types, and the goal is to maximize the steady-state rate of accumulation of payoff. Although we use terminology inspired by labor markets, our framework applies more broadly to platforms where a limited supply of heterogeneous products is matched to users over time. Our main contribution is a complete characterization of the structure of the optimal policy in the limit that each worker performs many jobs. The platform faces a tradeoff for each worker between myopically maximizing payoffs ( exploitation ) and learning the type of the worker ( exploration ). This creates a multitude of multiarmed bandit problems, one for each worker, coupled together by the constraint on availability of jobs of different types ( capacity constraints ). We find that the platform should estimate a shadow price for each job type and use the payoffs adjusted by these prices first to determine its learning goals and then for each worker (i) to balance learning with payoffs during the exploration phase and (ii) to myopically match after it has achieved its learning goals during the exploitation phase.

Keywords: matching; learning; two-sided platform; multiarmed bandit; capacity constraints (search for similar items in EconPapers)
Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (3)

Downloads: (external link)
https://doi.org/10.1287/opre.2020.2013 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:oropre:v:69:y:2021:i:2:p:655-681

Access Statistics for this article

More articles in Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().