Deep Learning for the Classification of Genomic Signals
J. Alejandro Morales,
Román Saldaña,
Manuel H. Santana-Castolo,
Carlos E. Torres-Cerna,
Ernesto Borrayo,
Adriana P. Mendizabal-Ruiz,
Hugo A. Vélez-Pérez and
Gerardo Mendizabal-Ruiz
Mathematical Problems in Engineering, 2020, vol. 2020, 1-9
Abstract:
Genomic signal processing (GSP) is based on the use of digital signal processing methods for the analysis of genomic data. Convolutional neural networks (CNN) are the state-of-the-art machine learning classifiers that have been widely applied to solve complex problems successfully. In this paper, we present a deep learning architecture and a method for the classification of three different functional genome types: coding regions (CDS), long noncoding regions (LNC), and pseudogenes (PSD) in genomic data, based on the use of GSP methods to convert the nucleotide sequence into a graphical representation of the information contained in it. The obtained accuracy scores of 83% and 84% when classifying between CDS vs. LNC and CDS vs. PSD, respectively, indicate the feasibility of employing this methodology for the classification of these types of sequences. The model was not able to differentiate from PSD and LNC. Our results indicate the feasibility of employing CNN with GSP for the classification of these types of DNA data.
Date: 2020
References: Add references at CitEc
Citations:
Downloads: (external link)
http://downloads.hindawi.com/journals/MPE/2020/7698590.pdf (application/pdf)
http://downloads.hindawi.com/journals/MPE/2020/7698590.xml (text/xml)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:hin:jnlmpe:7698590
DOI: 10.1155/2020/7698590
Access Statistics for this article
More articles in Mathematical Problems in Engineering from Hindawi
Bibliographic data for series maintained by Mohamed Abdelhakeem ().