Precise unbiased estimation in randomized experiments using auxiliary observational data

A., Gagnon-Bartsch Johann; C., Sales Adam; Edward, Wu; F., Botelho Anthony; A., Erickson John; W., Miratrix Luke; T., Heffernan Neil

Precise unbiased estimation in randomized experiments using auxiliary observational data

Gagnon-Bartsch Johann A. (), Sales Adam C. (), Wu Edward, Botelho Anthony F., Erickson John A., Miratrix Luke W. and Heffernan Neil T.
Additional contact information
Gagnon-Bartsch Johann A.: Department of Statistics, University of Michigan, Ann Arbor, Michigan, United Sates
Sales Adam C.: Department of Mathematical Sciences, Worcester Polytechnic Institute, Worcester, Massachusetts, United Sates
Wu Edward: Biocomplexity Institute, Social and Decision Analytics Division, University of Virginia, Charlottesville, Virginia, United Sates
Botelho Anthony F.: College of Education, University of Florida, Gainesville, Florida, United Sates
Erickson John A.: Analytics and Information Systems, Western Kentucky University, Bowling Green, KY 42101, United States
Miratrix Luke W.: Graduate School of Education, Harvard University, Cambridge, Massachusetts, United States
Heffernan Neil T.: Department of Computer Science, Worcester Polytechnic Institute, Worcester, Massachusetts, United Sates

Journal of Causal Inference, 2023, vol. 11, issue 1, 27

Abstract: Randomized controlled trials (RCTs) admit unconfounded design-based inference – randomization largely justifies the assumptions underlying statistical effect estimates – but often have limited sample sizes. However, researchers may have access to big observational data on covariates and outcomes from RCT nonparticipants. For example, data from A/B tests conducted within an educational technology platform exist alongside historical observational data drawn from student logs. We outline a design-based approach to using such observational data for variance reduction in RCTs. First, we use the observational data to train a machine learning algorithm predicting potential outcomes using covariates and then use that algorithm to generate predictions for RCT participants. Then, we use those predictions, perhaps alongside other covariates, to adjust causal effect estimates with a flexible, design-based covariate-adjustment routine. In this way, there is no danger of biases from the observational data leaking into the experimental estimates, which are guaranteed to be exactly unbiased regardless of whether the machine learning models are “correct” in any sense or whether the observational samples closely resemble RCT samples. We demonstrate the method in analyzing 33 randomized A/B tests and show that it decreases standard errors relative to other estimators, sometimes substantially.

Keywords: education research; A/B testing; data integration (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://doi.org/10.1515/jci-2022-0011 (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bpj:causin:v:11:y:2023:i:1:p:27:n:1004

DOI: 10.1515/jci-2022-0011

Access Statistics for this article

Journal of Causal Inference is currently edited by Elias Bareinboim, Jin Tian and Iván Díaz

More articles in Journal of Causal Inference from De Gruyter
Bibliographic data for series maintained by Peter Golla ().