An Asymptotically Tight Learning Algorithm for Mobile-Promotion Platforms
Zhichao Feng (),
Milind Dawande (),
Ganesh Janakiraman () and
Anyan Qi ()
Additional contact information
Zhichao Feng: Department of Logistics and Maritime Studies, Faculty of Business, The Hong Kong Polytechnic University, Kowloon, Hong Kong, China
Milind Dawande: Naveen Jindal School of Management, The University of Texas at Dallas, Richardson, Texas 75080
Ganesh Janakiraman: Naveen Jindal School of Management, The University of Texas at Dallas, Richardson, Texas 75080
Anyan Qi: Naveen Jindal School of Management, The University of Texas at Dallas, Richardson, Texas 75080
Management Science, 2023, vol. 69, issue 3, 1536-1554
Abstract:
Operating under both supply-side and demand-side uncertainties, a mobile-promotion platform conducts advertising campaigns for individual advertisers. Campaigns arrive dynamically over time, which is divided into seasons; each campaign requires the platform to deliver a target number of mobile impressions from a desired set of locations over a desired time interval. The platform fulfills these campaigns by procuring impressions from publishers, who supply advertising space on apps via real-time bidding on ad exchanges. Each location is characterized by its win curve , that is, the relationship between the bid price and the probability of winning an impression at that bid. The win curves at the various locations of interest are initially unknown to the platform, and it learns them on the fly based on the bids it places to win impressions and the realized outcomes. Each acquired impression is allocated to one of the ongoing campaigns. The platform’s objective is to minimize its total cost (the amount spent in procuring impressions and the penalty incurred due to unmet targets of the campaigns) over the time horizon of interest. Our main result is a bidding and allocation policy for this problem. We show that our policy is the best possible (asymptotically tight) for the problem using the notion of regret under a policy, namely the difference between the expected total cost under that policy and the optimal cost for the clairvoyant problem (i.e., one in which the platform has full information about the win curves at all the locations in advance): The lower bound on the regret under any policy is of the order of the square root of the number of seasons, and the regret under our policy matches this lower bound. We demonstrate the performance of our policy through numerical experiments on a test bed of instances whose input parameters are based on our observations at a real-world mobile-promotion platform.
Keywords: online advertising; learning; regret minimization; stochastic dynamic programming (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://dx.doi.org/10.1287/mnsc.2022.4441 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:inm:ormnsc:v:69:y:2023:i:3:p:1536-1554
Access Statistics for this article
More articles in Management Science from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().