Accurate genome-wide predictions of spatio-temporal gene expression during embryonic development
Jian Zhou,
Ignacio E Schor,
Victoria Yao,
Chandra L Theesfeld,
Raquel Marco-Ferreres,
Alicja Tadych,
Eileen E M Furlong and
Olga G Troyanskaya
PLOS Genetics, 2019, vol. 15, issue 9, 1-20
Abstract:
Comprehensive information on the timing and location of gene expression is fundamental to our understanding of embryonic development and tissue formation. While high-throughput in situ hybridization projects provide invaluable information about developmental gene expression patterns for model organisms like Drosophila, the output of these experiments is primarily qualitative, and a high proportion of protein coding genes and most non-coding genes lack any annotation. Accurate data-centric predictions of spatio-temporal gene expression will therefore complement current in situ hybridization efforts. Here, we applied a machine learning approach by training models on all public gene expression and chromatin data, even from whole-organism experiments, to provide genome-wide, quantitative spatio-temporal predictions for all genes. We developed structured in silico nano-dissection, a computational approach that predicts gene expression in >200 tissue-developmental stages. The algorithm integrates expression signals from a compendium of 6,378 genome-wide expression and chromatin profiling experiments in a cell lineage-aware fashion. We systematically evaluated our performance via cross-validation and experimentally confirmed 22 new predictions for four different embryonic tissues. The model also predicts complex, multi-tissue expression and developmental regulation with high accuracy. We further show the potential of applying these genome-wide predictions to extract tissue specificity signals from non-tissue-dissected experiments, and to prioritize tissues and stages for disease modeling. This resource, together with the exploratory tools are freely available at our webserver http://find.princeton.edu, which provides a valuable tool for a range of applications, from predicting spatio-temporal expression patterns to recognizing tissue signatures from differential gene expression profiles.Author summary: When and where a gene is expressed is fundamental information for understanding embryonic development. Current knowledge for such expression patterns is typically far from complete. Even for the long-standing model organism, Drosophila melanogaster, with large-scale in situ projects that have provided invaluable expression information for many genes, 40% of the genes still lack spatio-temporally resolved expression information. Such data is complemented by transcriptome datasets such as microarray and RNA-seq, which have whole-genome coverage and measure expression levels with greater dynamic range, but they typically lack precise spatio-temporal resolution. To bridge this gap, we developed a machine learning approach that combines the spatio-temporal resolution of in situ data with the accurate quantification and whole-genome coverage of genomic experiments, integrating information from 6,378 expression and chromatin profiling data sets. With this new approach, we present a genome-wide resource of spatio-temporal gene expression predictions for over 200 tissue-developmental stages during Drosophila embryogenesis. This resource is experimentally validated to have high-quality predictions, can guide the discovery of new tissue-specific genes, and provides a new tool to perform genome-wide analyses of spatio-temporal specificity.
Date: 2019
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1008382 (text/html)
https://journals.plos.org/plosgenetics/article/fil ... 08382&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pgen00:1008382
DOI: 10.1371/journal.pgen.1008382
Access Statistics for this article
More articles in PLOS Genetics from Public Library of Science
Bibliographic data for series maintained by plosgenetics ().