ROI maximization in stochastic online decision-making
Nicolo Cesa-Bianchi,
Cesari Tommaso,
Yishay Mansour and
Vianney Perchet
Additional contact information
Cesari Tommaso: TSE-R - Toulouse School of Economics - UT Capitole - Université Toulouse Capitole - UT - Université de Toulouse - EHESS - École des hautes études en sciences sociales - CNRS - Centre National de la Recherche Scientifique - INRAE - Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement
Post-Print from HAL
Abstract:
We introduce a novel theoretical framework for Return On Investment (ROI) maximization in repeated decision-making. Our setting is motivated by the use case of companies that regularly receive proposals for technological innovations and want to quickly decide whether they are worth implementing. We design an algorithm for learning ROI-maximizing decision-making policies over a sequence of innovation proposals. Our algorithm provably converges to an optimal policy in class Π at a rate of order min 1/(N∆2), N−1/3}, where N is the number of innovations and ∆ is the suboptimality gap in Π. A significant hurdle of our formulation, which sets it aside from other online learning problems such as bandits, is that running a policy does not provide an unbiased estimate of its performance.
Date: 2021
References: Add references at CitEc
Citations:
Published in Ranzato, M; Beygelzimer, A; Dauphin, Y; Liang, P.S; Wortman Vaughan, J. Advances in Neural Information Processing Systems (Online), 34, Neural Information Processing Systems Foundation., 2021, 9781713845393
There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:hal:journl:hal-03880759
Access Statistics for this paper
More papers in Post-Print from HAL
Bibliographic data for series maintained by CCSD ().