Cherry Picking
Megan Lang () and
Wenfeng Qiu
No as9zd, MetaArXiv from Center for Open Science
Abstract:
Measures like pre-analysis plans ask researchers to describe planned data collection and justify data exclusions, but they provide little enforceable oversight of primary data collection. We show that a simple algorithm can select large subsets of data that yield economically meaningful and statistically significant treatment effects. The subsets cannot be distinguished from a random sample of the original data, rendering the selection undetectable if peer reviewers are unaware of the size of the original dataset. Our results hold using simulated data and replication data from a well-known study. We show that there are few natural deterrents to dataset manipulation: the results in our selected subset are robust to a range of alternative specifications, our algorithm performs well under complex sampling strategies, and our subset can yield artificially high effects on multiple outcomes. We conclude by proposing a measure to prevent such manipulation in field experiments.
Date: 2021-08-24
New Economics Papers: this item is included in nep-ecm and nep-isf
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://osf.io/download/61256d816a7f6d001f47ab8a/
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:osf:metaar:as9zd
DOI: 10.31219/osf.io/as9zd
Access Statistics for this paper
More papers in MetaArXiv from Center for Open Science
Bibliographic data for series maintained by OSF ().