Online non-monotone diminishing return submodular maximization in the bandit setting

Ju, Jiachen; Wang, Xiao; Xu, Dachuan

Online non-monotone diminishing return submodular maximization in the bandit setting

Jiachen Ju (), Xiao Wang () and Dachuan Xu ()
Additional contact information
Jiachen Ju: Beijing University of Technology
Xiao Wang: Peng Cheng Laboratory
Dachuan Xu: Beijing University of Technology

Journal of Global Optimization, 2024, vol. 90, issue 3, No 3, 619-649

Abstract: Abstract In this paper, we study online diminishing return submodular (DR-submodular for short) maximization in the bandit setting. Our focus is on problems where the reward functions can be non-monotone, and the constraint set is a general convex set. We first present the Single-sampling Non-monotone Frank-Wolfe algorithm. This algorithm only requires a single call to each reward function, and it computes the stochastic gradient to make it suitable for large-scale settings where full gradient information might not be available. We provide an analysis of the approximation ratio and regret bound of the proposed algorithm. We then propose the Bandit Online Non-monotone Frank-Wolfe algorithm to adjust for problems in the bandit setting, where each reward function returns the function value at a single point. We take advantage of smoothing approximations to reward functions to tackle the challenges posed by the bandit setting. Under mild assumptions, our proposed algorithm can reach $$\frac{1}{4} (1- \min _{x\in {\mathcal {P}}'}\Vert x\Vert _\infty )$$ 1 4 ( 1 - min x ∈ P ′ ‖ x ‖ ∞ ) -approximation with regret bounded by $$O (T^{\frac{5 \min \{1, \gamma \}+5 }{6 \min \{1, \gamma \}+5}})$$ O ( T 5 min { 1 , γ } + 5 6 min { 1 , γ } + 5 ) , where the positive parameter $$\gamma $$ γ is related to the “safety domain” $${\mathcal {P}}'$$ P ′ . To the best of our knowledge, this is the first work to address online non-monotone DR-submodular maximization over a general convex set in the bandit setting.

Keywords: Approximation algorithm; Bandit setting; DR submodular maximization; Non-monotone function; Regret bound; 90C30; 68W25; 68W27 (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s10898-024-01413-0 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:jglopt:v:90:y:2024:i:3:d:10.1007_s10898-024-01413-0

Ordering information: This journal article can be ordered from
http://www.springer. ... search/journal/10898

DOI: 10.1007/s10898-024-01413-0

Access Statistics for this article

Journal of Global Optimization is currently edited by Sergiy Butenko

More articles in Journal of Global Optimization from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().