Active learning with biased non-response to label requests
Thomas Robinson,
Niek Tax,
Richard Mudd and
Ido Guy
LSE Research Online Documents on Economics from London School of Economics and Political Science, LSE Library
Abstract:
Active learning can improve the efficiency of training prediction models by identifying the most informative new labels to acquire. However, non-response to label requests can impact active learning’s effectiveness in real-world contexts. We conceptualise this degradation by considering the type of non-response present in the data, demonstrating that biased non-response is particularly detrimental to model performance. We argue that biased non-response is likely in contexts where the labelling process, by nature, relies on user interactions. To mitigate the impact of biased non-response, we propose a cost-based correction to the sampling strategy–the Upper Confidence Bound of the Expected Utility (UCB-EU)–that can, plausibly, be applied to any active learning algorithm. Through experiments, we demonstrate that our method successfully reduces the harm from labelling non-response in many settings. However, we also characterise settings where the non-response bias in the annotations remains detrimental under UCB-EU for specific sampling methods and data generating processes. Finally, we evaluate our method on a real-world dataset from an e-commerce platform. We show that UCB-EU yields substantial performance improvements to conversion models that are trained on clicked impressions. Most generally, this research serves to both better conceptualise the interplay between types of non-response and model improvements via active learning, and to provide a practical, easy-to-implement correction that mitigates model degradation.
Keywords: active learning; non-response; missing data; e-commerce; CTR prediction (search for similar items in EconPapers)
JEL-codes: L81 (search for similar items in EconPapers)
Pages: 24 pages
Date: 2024-07-01
New Economics Papers: this item is included in nep-ecm and nep-upt
References: Add references at CitEc
Citations:
Published in Data Mining and Knowledge Discovery, 1, July, 2024, 38(4), pp. 2117 - 2140. ISSN: 1384-5810
Downloads: (external link)
http://eprints.lse.ac.uk/123029/ Open access version. (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:ehl:lserod:123029
Access Statistics for this paper
More papers in LSE Research Online Documents on Economics from London School of Economics and Political Science, LSE Library LSE Library Portugal Street London, WC2A 2HD, U.K.. Contact information at EDIRC.
Bibliographic data for series maintained by LSERO Manager (lseresearchonline@lse.ac.uk).