Machine-learning predicts genomic determinants of meiosis-driven structural variation in a eukaryotic pathogen
Thomas Badet,
Simone Fouché,
Fanny E. Hartmann,
Marcello Zala and
Daniel Croll ()
Additional contact information
Thomas Badet: Laboratory of Evolutionary Genetics, Institute of Biology, University of Neuchâtel
Simone Fouché: Laboratory of Evolutionary Genetics, Institute of Biology, University of Neuchâtel
Fanny E. Hartmann: Ecologie Systématique Evolution, Bâtiment 360, Univ. Paris-Sud, AgroParisTech, CNRS, Université Paris-Saclay
Marcello Zala: Plant Pathology, Institute of Integrative Biology, ETH Zurich
Daniel Croll: Laboratory of Evolutionary Genetics, Institute of Biology, University of Neuchâtel
Nature Communications, 2021, vol. 12, issue 1, 1-14
Abstract:
Abstract Species harbor extensive structural variation underpinning recent adaptive evolution. However, the causality between genomic features and the induction of new rearrangements is poorly established. Here, we analyze a global set of telomere-to-telomere genome assemblies of a fungal pathogen of wheat to establish a nucleotide-level map of structural variation. We show that the recent emergence of pesticide resistance has been disproportionally driven by rearrangements. We use machine learning to train a model on structural variation events based on 30 chromosomal sequence features. We show that base composition and gene density are the major determinants of structural variation. Retrotransposons explain most inversion, indel and duplication events. We apply our model to Arabidopsis thaliana and show that our approach extends to more complex genomes. Finally, we analyze complete genomes of haploid offspring in a four-generation pedigree. Meiotic crossover locations are enriched for new rearrangements consistent with crossovers being mutational hotspots. The model trained on species-wide structural variation accurately predicts the position of >74% of newly generated variants along the pedigree. The predictive power highlights causality between specific sequence features and the induction of chromosomal rearrangements. Our work demonstrates that training sequence-derived models can accurately identify regions of intrinsic DNA instability in eukaryotic genomes.
Date: 2021
References: Add references at CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
https://www.nature.com/articles/s41467-021-23862-x Abstract (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:12:y:2021:i:1:d:10.1038_s41467-021-23862-x
Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/
DOI: 10.1038/s41467-021-23862-x
Access Statistics for this article
Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie
More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().