Raiders of the lost HARK: a reproducible inference framework for big data science
Mattia Prosperi (),
Jiang Bian,
Iain E. Buchan,
James S. Koopman,
Matthew Sperrin and
Mo Wang
Additional contact information
Mattia Prosperi: University of Florida
Jiang Bian: University of Florida
Iain E. Buchan: University of Liverpool
James S. Koopman: University of Michigan
Matthew Sperrin: University of Manchester
Mo Wang: University of Florida
Palgrave Communications, 2019, vol. 5, issue 1, 1-12
Abstract:
Abstract Hypothesizing after the results are known (HARK) has been disparaged as data dredging, and safeguards including hypothesis preregistration and statistically rigorous oversight have been recommended. Despite potential drawbacks, HARK has deepened thinking about complex causal processes. Some of the HARK precautions can conflict with the modern reality of researchers’ obligations to use big, ‘organic’ data sources—from high-throughput genomics to social media streams. We here propose a HARK-solid, reproducible inference framework suitable for big data, based on models that represent formalization of hypotheses. Reproducibility is attained by employing two levels of model validation: internal (relative to data collated around hypotheses) and external (independent to the hypotheses used to generate data or to the data used to generate hypotheses). With a model-centered paradigm, the reproducibility focus changes from the ability of others to reproduce both data and specific inferences from a study to the ability to evaluate models as representation of reality. Validation underpins ‘natural selection’ in a knowledge base maintained by the scientific community. The community itself is thereby supported to be more productive in generating and critically evaluating theories that integrate wider, complex systems.
Date: 2019
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1057/s41599-019-0340-8 Abstract (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:pal:palcom:v:5:y:2019:i:1:d:10.1057_s41599-019-0340-8
Ordering information: This journal article can be ordered from
https://www.nature.com/palcomms/about
DOI: 10.1057/s41599-019-0340-8
Access Statistics for this article
More articles in Palgrave Communications from Palgrave Macmillan
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().