Deep learning for genomics using Janggu

Kopp, Wolfgang; Monti, Remo; Tamburrini, Annalaura; Ohler, Uwe; Akalin, Altuna

Deep learning for genomics using Janggu

Wolfgang Kopp (), Remo Monti, Annalaura Tamburrini, Uwe Ohler and Altuna Akalin ()
Additional contact information
Wolfgang Kopp: Berlin Institute for Medical Systems Biology, Max Delbrueck Center for Molecular Medicine
Remo Monti: Berlin Institute for Medical Systems Biology, Max Delbrueck Center for Molecular Medicine
Annalaura Tamburrini: Berlin Institute for Medical Systems Biology, Max Delbrueck Center for Molecular Medicine
Uwe Ohler: Berlin Institute for Medical Systems Biology, Max Delbrueck Center for Molecular Medicine
Altuna Akalin: Berlin Institute for Medical Systems Biology, Max Delbrueck Center for Molecular Medicine

Nature Communications, 2020, vol. 11, issue 1, 1-7

Abstract: Abstract In recent years, numerous applications have demonstrated the potential of deep learning for an improved understanding of biological processes. However, most deep learning tools developed so far are designed to address a specific question on a fixed dataset and/or by a fixed model architecture. Here we present Janggu, a python library facilitates deep learning for genomics applications, aiming to ease data acquisition and model evaluation. Among its key features are special dataset objects, which form a unified and flexible data acquisition and pre-processing framework for genomics data that enables streamlining of future research applications through reusable components. Through a numpy-like interface, these dataset objects are directly compatible with popular deep learning libraries, including keras or pytorch. Janggu offers the possibility to visualize predictions as genomic tracks or by exporting them to the bigWig format as well as utilities for keras-based models. We illustrate the functionality of Janggu on several deep learning genomics applications. First, we evaluate different model topologies for the task of predicting binding sites for the transcription factor JunD. Second, we demonstrate the framework on published models for predicting chromatin effects. Third, we show that promoter usage measured by CAGE can be predicted using DNase hypersensitivity, histone modifications and DNA sequence features. We improve the performance of these models due to a novel feature in Janggu that allows us to include high-order sequence features. We believe that Janggu will help to significantly reduce repetitive programming overhead for deep learning applications in genomics, and will enable computational biologists to rapidly assess biological hypotheses.

Date: 2020
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.nature.com/articles/s41467-020-17155-y Abstract (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:11:y:2020:i:1:d:10.1038_s41467-020-17155-y

Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/

DOI: 10.1038/s41467-020-17155-y

Access Statistics for this article

Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie

More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().