A novel artificial intelligence-based approach for identification of deoxynucleotide aptamers
Frances L Heredia,
Abiel Roche-Lima and
Elsie I Parés-Matos
PLOS Computational Biology, 2021, vol. 17, issue 8, 1-18
Abstract:
The selection of a DNA aptamer through the Systematic Evolution of Ligands by EXponential enrichment (SELEX) method involves multiple binding steps, in which a target and a library of randomized DNA sequences are mixed for selection of a single, nucleotide-specific molecule. Usually, 10 to 20 steps are required for SELEX to be completed. Throughout this process it is necessary to discriminate between true DNA aptamers and unspecified DNA-binding sequences. Thus, a novel machine learning-based approach was developed to support and simplify the early steps of the SELEX process, to help discriminate binding between DNA aptamers from those unspecified targets of DNA-binding sequences. An Artificial Intelligence (AI) approach to identify aptamers were implemented based on Natural Language Processing (NLP) and Machine Learning (ML). NLP method (CountVectorizer) was used to extract information from the nucleotide sequences. Four ML algorithms (Logistic Regression, Decision Tree, Gaussian Naïve Bayes, Support Vector Machines) were trained using data from the NLP method along with sequence information. The best performing model was Support Vector Machines because it had the best ability to discriminate between positive and negative classes. In our model, an Accuracy (A) of 0.995, the fraction of samples that the model correctly classified, and an Area Under the Receiving Operating Curve (AUROC) of 0.998, the degree by which a model is capable of distinguishing between classes, were observed. The developed AI approach is useful to identify potential DNA aptamers to reduce the amount of rounds in a SELEX selection. This new approach could be applied in the design of DNA libraries and result in a more efficient and faster process for DNA aptamers to be chosen during SELEX.Author summary: In this manuscript authors explain the development and validation of a novel artificial intelligence approach to support and simplify the early steps of the process from SELEX, to help discriminate binding between deoxynucleotide aptamers from those unspecified targets of DNA-binding sequences. The approach was implemented based on Natural Language Processing and Machine Learning. CountVectorizer, a Natural Language Processing method, was used to extract information from nucleotide sequences. Four Machine Learning algorithms (Logistic Regression, Decision Tree, Gaussian Naïve Bayes, and Support Vector Machines) were trained using data from the Natural Language Processing method along with sequence information. From these four trained machine learning algorithms, the best performance and selected model was Support Vectors Machines, because it had the best discriminatory metrics (i.e., Accuracy (A) = 0.995; AUROC (AU) = 0.998). In general, all models showed good metric results for predicting DNA aptamer sequences. The Machine Learning model complexity and difficult interpretation may hinder its application into the standard practice. For this reason, the development of a web-app is already taking place to facilitate the interpretation and application of the obtained results.
Date: 2021
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009247 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 09247&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1009247
DOI: 10.1371/journal.pcbi.1009247
Access Statistics for this article
More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().