Developing a modern data workflow for regularly updated data
Glenda M Yenni,
Erica M Christensen,
Ellen K Bledsoe,
Sarah R Supp,
Renata M Diaz,
Ethan P White and
S K Morgan Ernest
PLOS Biology, 2019, vol. 17, issue 1, 1-12
Abstract:
Over the past decade, biology has undergone a data revolution in how researchers collect data and the amount of data being collected. An emerging challenge that has received limited attention in biology is managing, working with, and providing access to data under continual active collection. Regularly updated data present unique challenges in quality assurance and control, data publication, archiving, and reproducibility. We developed a workflow for a long-term ecological study that addresses many of the challenges associated with managing this type of data. We do this by leveraging existing tools to 1) perform quality assurance and control; 2) import, restructure, version, and archive data; 3) rapidly publish new data in ways that ensure appropriate credit to all contributors; and 4) automate most steps in the data pipeline to reduce the time and effort required by researchers. The workflow leverages tools from software development, including version control and continuous integration, to create a modern data management system that automates the pipeline.This Community Page article describes a data management workflow that can be readily implemented by small research teams and which solves the core challenges of managing regularly updating data. It includes a template repository and tutorial to assist others in setting up their own regularly updating data management systems.
Date: 2019
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3000125 (text/html)
https://journals.plos.org/plosbiology/article/file ... 00125&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pbio00:3000125
DOI: 10.1371/journal.pbio.3000125
Access Statistics for this article
More articles in PLOS Biology from Public Library of Science
Bibliographic data for series maintained by plosbiology ().