Shrinking the Upper Confidence Bound: A Dynamic Product Selection Problem for Urban Warehouses
Rong Jin (),
David Simchi-Levi (),
Li Wang (),
Xinshang Wang () and
Sen Yang ()
Additional contact information
Rong Jin: Alibaba Group US, San Mateo, California 94402
David Simchi-Levi: Institute for Data, Systems, and Society, Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139; Operations Research Center, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
Li Wang: Operations Research Center, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
Xinshang Wang: Alibaba Group US, San Mateo, California 94402; Antai College of Economics and Management, Shanghai Jiao Tong University, Shanghai 200240, China
Sen Yang: Alibaba Group US, San Mateo, California 94402
Management Science, 2021, vol. 67, issue 8, 4756-4771
Abstract:
The recent rising popularity of ultrafast delivery services on retail platforms fuels the increasing use of urban warehouses, whose proximity to customers makes fast deliveries viable. The space limit in urban warehouses poses a problem for such online retailers: the number of stock keeping units (SKUs) they carry is no longer “the more, the better,” yet it can still be significantly large, reaching hundreds or thousands in a product category. In this paper, we study algorithms for dynamically selecting a large number of products (i.e., SKUs) with top customer purchase probabilities on the fly, from an ocean of potential products to offer on retailers’ ultrafast delivery platforms. We distill the product selection problem into a semibandit model with linear generalization. There are in total N arms corresponding to N products, each with a feature vector of dimension d . The player pulls K arms in each period and observes the bandit feedback from each of the pulled arms. We focus on the setting where K is much greater than the number of total time periods T or the dimension of product features d . We first analyze a standard Upper Confidence Bound (UCB) algorithm and show its regret bound can be expressed as the sum of a T -independent part and a T -dependent part, which we refer to as “fixed cost” and “variable cost,” respectively. To reduce the fixed cost for large K values, we propose a novel online learning algorithm, which iteratively shrinks the upper confidence bounds within each period, and show its fixed cost is reduced by a factor of d . Moreover, we test the algorithms on an industrial data set from Alibaba Group. Experimental results show that our new algorithm reduces the total regret of the standard UCB algorithm by at least 10%.
Keywords: sequential decision-making; product selection; online learning; online retailing; regret analysis (search for similar items in EconPapers)
Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://dx.doi.org/10.1287/mnsc.2020.3773 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:inm:ormnsc:v:67:y:2021:i:8:p:4756-4771
Access Statistics for this article
More articles in Management Science from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().