Preprocessing of Public RNA-Sequencing Datasets to Facilitate Downstream Analyses of Human Diseases
Naomi Rapier-Sharman,
John Krapohl,
Ethan J. Beausoleil,
Kennedy T. L. Gifford,
Benjamin R. Hinatsu,
Curtis S. Hoffmann,
Makayla Komer,
Tiana M. Scott and
Brett E. Pickett
Additional contact information
Naomi Rapier-Sharman: Department of Microbiology and Molecular Biology, Brigham Young University, Provo, UT 84602, USA
John Krapohl: Department of Microbiology and Molecular Biology, Brigham Young University, Provo, UT 84602, USA
Ethan J. Beausoleil: Department of Microbiology and Molecular Biology, Brigham Young University, Provo, UT 84602, USA
Kennedy T. L. Gifford: Department of Microbiology and Molecular Biology, Brigham Young University, Provo, UT 84602, USA
Benjamin R. Hinatsu: Department of Microbiology and Molecular Biology, Brigham Young University, Provo, UT 84602, USA
Curtis S. Hoffmann: Department of Microbiology and Molecular Biology, Brigham Young University, Provo, UT 84602, USA
Makayla Komer: Department of Microbiology and Molecular Biology, Brigham Young University, Provo, UT 84602, USA
Tiana M. Scott: Department of Microbiology and Molecular Biology, Brigham Young University, Provo, UT 84602, USA
Brett E. Pickett: Department of Microbiology and Molecular Biology, Brigham Young University, Provo, UT 84602, USA
Data, 2021, vol. 6, issue 7, 1-10
Abstract:
Publicly available RNA-sequencing (RNA-seq) data are a rich resource for elucidating the mechanisms of human disease; however, preprocessing these data requires considerable bioinformatic expertise and computational infrastructure. Analyzing multiple datasets with a consistent computational workflow increases the accuracy of downstream meta-analyses. This collection of datasets represents the human intracellular transcriptional response to disorders and diseases such as acute lymphoblastic leukemia (ALL), B-cell lymphomas, chronic obstructive pulmonary disease (COPD), colorectal cancer, lupus erythematosus; as well as infection with pathogens including Borrelia burgdorferi , hantavirus, influenza A virus, Middle East respiratory syndrome coronavirus (MERS-CoV), Streptococcus pneumoniae , respiratory syncytial virus (RSV), severe acute respiratory syndrome coronavirus (SARS-CoV), and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). We calculated the statistically significant differentially expressed genes and Gene Ontology terms for all datasets. In addition, a subset of the datasets also includes results from splice variant analyses, intracellular signaling pathway enrichments as well as read mapping and quantification. All analyses were performed using well-established algorithms and are provided to facilitate future data mining activities, wet lab studies, and to accelerate collaboration and discovery.
Keywords: transcriptomics; RNA-sequencing; autoimmune diseases; cancer; pathogens; bacteria; viruses; data preprocessing (search for similar items in EconPapers)
JEL-codes: C8 C80 C81 C82 C83 (search for similar items in EconPapers)
Date: 2021
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2306-5729/6/7/75/pdf (application/pdf)
https://www.mdpi.com/2306-5729/6/7/75/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jdataj:v:6:y:2021:i:7:p:75-:d:594584
Access Statistics for this article
Data is currently edited by Ms. Cecilia Yang
More articles in Data from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().