Online Learning with Sample Selection Bias

Singhvi, Divya; Singhvi, Somya

Online Learning with Sample Selection Bias

Divya Singhvi () and Somya Singhvi ()
Additional contact information
Divya Singhvi: Leonard N. Stern School of Business, New York University, New York, New York 10012
Somya Singhvi: Marshall School of Business, University of Southern California, Los Angeles, California 90005

Operations Research, 2025, vol. 73, issue 5, 2458-2476

Abstract: We consider the problem of personalized recommendations on online platforms, where user preferences are unknown, and users interact with the platform through a series of sequential decisions (such as clicking to watch on video platforms or clicking to donate on donation platforms). The platform aims to maximize the final outcome (e.g., viewing duration on video platforms or donations on donation platforms). However, the platform only observes the final outcome for users who complete the first stage (clicking on the recommendation). The final outcome for users who do not complete the first stage (not clicking on the recommendation) remains unobserved (also referred to as funneling ). This censoring of outcomes creates a selection bias issue, as the observed outcomes at different stages are often correlated. We demonstrate that failing to account for this selection bias results in biased estimates and suboptimal recommendations. In fact, well-performing personalized learning algorithms perform poorly and incur linear regret in this setting. Therefore, we propose the sample selection bandit (SSB) algorithm, which combines Heckman’s two-step estimator with the “optimism under uncertainty” principle to address the sample selection bias issue. We show that the SSB algorithm achieves a rate-optimal regret rate (up to logarithmic terms) of O ˜ ( T ) . Furthermore, we conduct extensive numerical experiments on both synthetic data and real donation data collected from GoFundMe (a crowdfunding platform), demonstrating significant improvements over benchmark state-of-the-art learning algorithms in this setting.

Keywords: Market; Analytics; and; Revenue; Management; bandits; donations; personalization; sample-selection bias; crowd-funding platforms; operations for social-good (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
http://dx.doi.org/10.1287/opre.2023.0223 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:oropre:v:73:y:2025:i:5:p:2458-2476

Access Statistics for this article

More articles in Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().