Systematic benchmarking of deep-learning methods for tertiary RNA structure prediction

Bahai, Akash; Kwoh, Chee Keong; Mu, Yuguang; Li, Yinghui

Systematic benchmarking of deep-learning methods for tertiary RNA structure prediction

Akash Bahai, Chee Keong Kwoh, Yuguang Mu and Yinghui Li

PLOS Computational Biology, 2024, vol. 20, issue 12, 1-44

Abstract: The 3D structure of RNA critically influences its functionality, and understanding this structure is vital for deciphering RNA biology. Experimental methods for determining RNA structures are labour-intensive, expensive, and time-consuming. Computational approaches have emerged as valuable tools, leveraging physics-based-principles and machine learning to predict RNA structures rapidly. Despite advancements, the accuracy of computational methods remains modest, especially when compared to protein structure prediction. Deep learning methods, while successful in protein structure prediction, have shown some promise for RNA structure prediction as well, but face unique challenges. This study systematically benchmarks state-of-the-art deep learning methods for RNA structure prediction across diverse datasets. Our aim is to identify factors influencing performance variation, such as RNA family diversity, sequence length, RNA type, multiple sequence alignment (MSA) quality, and deep learning model architecture. We show that generally ML-based methods perform much better than non-ML methods on most RNA targets, although the performance difference isn’t substantial when working with unseen novel or synthetic RNAs. The quality of the MSA and secondary structure prediction both play an important role and most methods aren’t able to predict non-Watson-Crick pairs in the RNAs. Overall among the automated 3D RNA structure prediction methods, DeepFoldRNA has the best prediction results followed by DRFold as the second best method. Finally, we also suggest possible mitigations to improve the quality of the prediction for future method development.Author summary: Systematic benchmarking of five latest deep-learning and two fragment-assembly based methods on diverse datasetsCompiled a new balanced dataset with latest RNA structures for benchmarkingGenerally, the ML-based methods outperform the traditional fragment-assembly based methods with DeepFoldRNA having the best predicted models overallOn orphan RNA’s, the ML-based methods are only slightly better than FA-based methods, and generally all methods have poor performance on orphan RNAs.The performance of the methods is dependent on the MSA depth, RNA type, and secondary structure.

Date: 2024
References: Add references at CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1012715 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 12715&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1012715

DOI: 10.1371/journal.pcbi.1012715

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().