A machine-compiled database of genome-wide association studies
Volodymyr Kuleshov (),
Jialin Ding,
Christopher Vo,
Braden Hancock,
Alexander Ratner,
Yang Li,
Christopher Ré,
Serafim Batzoglou and
Michael Snyder
Additional contact information
Volodymyr Kuleshov: Stanford University
Jialin Ding: Stanford University
Christopher Vo: Stanford University
Braden Hancock: Stanford University
Alexander Ratner: Stanford University
Yang Li: University of Chicago
Christopher Ré: Stanford University
Serafim Batzoglou: Stanford University
Michael Snyder: Stanford University School of Medicine
Nature Communications, 2019, vol. 10, issue 1, 1-8
Abstract:
Abstract Tens of thousands of genotype-phenotype associations have been discovered to date, yet not all of them are easily accessible to scientists. Here, we describe GWASkb, a machine-compiled knowledge base of genetic associations collected from the scientific literature using automated information extraction algorithms. Our information extraction system helps curators by automatically collecting over 6,000 associations from open-access publications with an estimated recall of 60–80% and with an estimated precision of 78–94% (measured relative to existing manually curated knowledge bases). This system represents a fully automated GWAS curation effort and is made possible by a paradigm for constructing machine learning systems called data programming. Our work represents a step towards making the curation of scientific literature more efficient using automated systems.
Date: 2019
References: Add references at CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
https://www.nature.com/articles/s41467-019-11026-x Abstract (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:10:y:2019:i:1:d:10.1038_s41467-019-11026-x
Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/
DOI: 10.1038/s41467-019-11026-x
Access Statistics for this article
Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie
More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().