EconPapers    
Economics at your fingertips  
 

Comparing the Performance of Statistical Adjustment Methods by Recovering the Experimental Benchmark from the REFLUX Trial

Luke Keele, Stephen O’Neill and Richard Grieve
Additional contact information
Luke Keele: University of Pennsylvania, Philadelphia, PA, USA
Stephen O’Neill: Department of Health Services Research and Policy, London School of Hygiene and Tropical Medicine, London, UK
Richard Grieve: Department of Health Services Research and Policy, London School of Hygiene and Tropical Medicine, London, UK

Medical Decision Making, 2021, vol. 41, issue 3, 340-353

Abstract: Much evidence in comparative effectiveness research is based on observational studies. Researchers who conduct observational studies typically assume that there are no unobservable differences between the treatment groups under comparison. Treatment effectiveness is estimated after adjusting for observed differences between comparison groups. However, estimates of treatment effectiveness may be biased because of misspecification of the statistical model. That is, if the method of treatment effect estimation imposes unduly strong functional form assumptions, treatment effect estimates may be inaccurate, leading to inappropriate recommendations about treatment decisions. We compare the performance of a wide variety of treatment effect estimation methods for the average treatment effect. We do so within the context of the REFLUX study from the United Kingdom. In REFLUX, participants were enrolled in either an randomized controlled trial (RCT) or an observational study arm. In the RCT, patients were randomly assigned to either surgery or medical management. In the patient preference arm, participants selected to either have surgery or medical management. We attempt to recover the treatment effect estimate from the RCT using the data from the patient preference arms of the study. We vary the method of treatment effect estimation and record which methods are successful and which are not. We apply more than 20 different methods, including standard regression models as well as advanced machine learning methods. We find that simple propensity score matching methods provide the least accurate estimates versus the RCT benchmark. We find variation in performance across the other methods, with some, but not all recovering the experimental benchmark. We conclude that future studies should use multiple methods of estimation to fully represent uncertainty according to the choice of estimation approach.

Keywords: causal inference; observational study; research design (search for similar items in EconPapers)
Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.sagepub.com/doi/10.1177/0272989X20986545 (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:sae:medema:v:41:y:2021:i:3:p:340-353

DOI: 10.1177/0272989X20986545

Access Statistics for this article

More articles in Medical Decision Making
Bibliographic data for series maintained by SAGE Publications ().

 
Page updated 2025-03-19
Handle: RePEc:sae:medema:v:41:y:2021:i:3:p:340-353