Cache-efficient and vectorized parallel dynamic programming for RNA folding

Gruzewski, Mateusz; Palkowski, Marek

Cache-efficient and vectorized parallel dynamic programming for RNA folding

Mateusz Gruzewski and Marek Palkowski

PLOS ONE, 2026, vol. 21, issue 5, 1-16

Abstract: In this article, we present an efficient and concise OpenMP implementation of the Nussinov RNA folding algorithm, a well-known representative of non-serial polyadic dynamic programming (NPDP). Our goal is to develop an optimized implementation that can serve as a template for related dynamic programming applications. The proposed code is derived from a detailed analysis of manual implementations, emphasizing the separation of problematic and non-problematic instances and structuring computations in a way analogous to matrix multiplication. This design enables the semi-automatic extraction of data locality using tools based on Presburger arithmetic—techniques widely employed in classical loop transformations and advanced source-to-source compilers grounded in the polyhedral model. In the experimental evaluation, we assess the performance of our implementation on modern massively parallel AMD and Intel processors with 64, 128, and 192 threads. Our approach leverages cache-aware tiling, parallelism, and explicit vectorization to maximize computational efficiency, achieving performance that surpasses both automatically generated compiler-based solutions and manually tuned implementations on the evaluated platforms. Specifically, our implementation achieves execution times up to two orders of magnitude faster than polyhedral code, while also outperforming unvectorized manual approaches—being at least 30 × faster than array transposition–based methods and at least 5 × faster than the tiled sparsified Four Russians variant. Additionally, our results indicate that CPU implementations do not exhibit significantly worse performance compared to their corresponding GPU counterparts. These results demonstrate the importance of leveraging Advanced Vector Extensions (AVX) to fully exploit the capabilities of modern multi-core processors, particularly those in the AMD Epyc family.

Date: 2026
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0349146 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 49146&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0349146

DOI: 10.1371/journal.pone.0349146

Access Statistics for this article

More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().