Wide-Open: Accelerating public data release by automating detection of overdue datasets

Grechkin, Maxim; Poon, Hoifung; Howe, Bill

Wide-Open: Accelerating public data release by automating detection of overdue datasets

Maxim Grechkin, Hoifung Poon and Bill Howe

PLOS Biology, 2017, vol. 15, issue 6, 1-5

Abstract: Open data is a vital pillar of open science and a key enabler for reproducibility, data reuse, and novel discoveries. Enforcement of open-data policies, however, largely relies on manual efforts, which invariably lag behind the increasingly automated generation of biological data. To address this problem, we developed a general approach to automatically identify datasets overdue for public release by applying text mining to identify dataset references in published articles and parse query results from repositories to determine if the datasets remain private. We demonstrate the effectiveness of this approach on 2 popular National Center for Biotechnology Information (NCBI) repositories: Gene Expression Omnibus (GEO) and Sequence Read Archive (SRA). Our Wide-Open system identified a large number of overdue datasets, which spurred administrators to respond directly by releasing 400 datasets in one week.

Date: 2017
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.2002477 (text/html)
https://journals.plos.org/plosbiology/article/file ... 02477&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pbio00:2002477

DOI: 10.1371/journal.pbio.2002477

Access Statistics for this article

More articles in PLOS Biology from Public Library of Science
Bibliographic data for series maintained by plosbiology ().