Multi-Armed Bandits with Endogenous Learning Curves: An Application to Split Liver Transplantation

Tang, Yanhan (Savannah); Li, Andrew; Scheller-Wolf, Alan; Tayur, Sridhar

Multi-Armed Bandits with Endogenous Learning Curves: An Application to Split Liver Transplantation

Yanhan (Savannah) Tang (), Andrew Li (), Alan Scheller-Wolf () and Sridhar Tayur ()
Additional contact information
Yanhan (Savannah) Tang: Cox School of Business, Southern Methodist University, Dallas, Texas 75275
Andrew Li: Tepper School of Business, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
Alan Scheller-Wolf: Tepper School of Business, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
Sridhar Tayur: Tepper School of Business, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213

Manufacturing & Service Operations Management, 2025, vol. 27, issue 2, 640-658

Abstract: Problem Definition: Proficiency in many sophisticated tasks is attained through experience-based learning, in other words, learning by doing. For example, transplant centers’ surgical teams need to practice difficult surgeries to master the skills required. Meanwhile, this experience-based learning may affect other stakeholders, such as patients eligible for transplant surgeries, and require resources, including scarce organs and continual efforts. To ensure that patients have excellent outcomes and equitable access to organs, the organ allocation authority needs to quickly identify and develop medical teams with high aptitudes. This entails striking a balance between exploring surgical combinations with initially unknown full potential and exploiting existing knowledge based on observed outcomes. Methodology/results: We formulate a multi-armed bandit (MAB) model in which parametric learning curves are embedded in the reward functions to capture endogenous experience-based learning. In addition, our model includes provisions ensuring that the choices of arms are subject to fairness constraints to guarantee equity. To solve our MAB problem, we propose the L-UCB and FL-UCB algorithms, variants of the upper confidence bound (UCB) algorithm that attain the optimal O ( log t ) regret on problems enhanced with experience-based learning and fairness concerns. We demonstrate our model and algorithms on the split liver transplantation (SLT) allocation problem, showing that our algorithms have superior numerical performance compared with standard bandit algorithms in a setting where experience-based learning and fairness concerns exist. Managerial implications: From a methodological point of view, our proposed MAB model and algorithms are generic and have broad application prospects. From an application standpoint, our algorithms could be applied to help evaluate potential strategies to increase the proliferation of SLT and other technically difficult procedures.

Keywords: multi-armed bandit; upper confidence bound algorithms; endogenous learning curves; nonstationary reward curves; split liver transplantation; fairness (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
http://dx.doi.org/10.1287/msom.2022.0412 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:ormsom:v:27:y:2025:i:2:p:640-658

Access Statistics for this article

More articles in Manufacturing & Service Operations Management from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().