Scalable QR Factorisation of Ill-Conditioned Tall-and-Skinny Matrices on Distributed GPU Systems
Nenad Mijić,
Abhiram Kaushik,
Dario Živković and
Davor Davidović ()
Additional contact information
Nenad Mijić: Centre for Informatics and Computing, Ruđer Bošković Institute, Bijenička Cesta 54, 10000 Zagreb, Croatia
Abhiram Kaushik: Centre for Informatics and Computing, Ruđer Bošković Institute, Bijenička Cesta 54, 10000 Zagreb, Croatia
Dario Živković: Centre for Informatics and Computing, Ruđer Bošković Institute, Bijenička Cesta 54, 10000 Zagreb, Croatia
Davor Davidović: Centre for Informatics and Computing, Ruđer Bošković Institute, Bijenička Cesta 54, 10000 Zagreb, Croatia
Mathematics, 2025, vol. 13, issue 22, 1-21
Abstract:
The QR factorisation is a cornerstone of numerical linear algebra, essential for solving overdetermined linear systems, eigenvalue problems, and various scientific computing tasks. However, computing it for ill-conditioned tall-and-skinny (TS) matrices on large-scale distributed-memory systems, particularly those with multiple GPUs, presents significant challenges in balancing numerical stability, high performance, and efficient communication. Traditional Householder-based QR methods provide numerical stability but perform poorly on TS matrices due to their reliance on memory-bound kernels. This paper introduces a novel algorithm for computing the QR factorisation of ill-conditioned TS matrices based on CholeskyQR methods. Although CholeskyQR is fast, it typically fails due to severe loss of orthogonality for ill-conditioned inputs. To solve this, our new algorithm, mCQRGSI+, combines the speed of CholeskyQR with stabilising techniques from the Gram–Schmidt process. It is specifically optimised for distributed multi-GPU systems, using adaptive strategies to balance computation and communication. Our analysis shows the method achieves accuracy comparable to Householder QR, even for extremely ill-conditioned matrices (condition numbers up to 10 16 ). Scaling experiments demonstrate speedups of up to 12 × over ScaLAPACK and 16 × over SLATE’s CholeskyQR2. This work delivers a method that is both robust and highly parallel, advancing the state-of-the-art for this challenging class of problems.
Keywords: QR factorisation; CholeskyQR; Gram–Schmidt orthogonalisation; tall-and-skinny matrices; ill-conditioned matrices; graphic processing units; distributed systems; parallel algorithms (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2227-7390/13/22/3608/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/22/3608/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:22:p:3608-:d:1791841
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().