EconPapers    
Economics at your fingertips  
 

Scalable QR Factorisation of Ill-Conditioned Tall-and-Skinny Matrices on Distributed GPU Systems

Nenad Mijić, Abhiram Kaushik, Dario Živković and Davor Davidović ()
Additional contact information
Nenad Mijić: Centre for Informatics and Computing, Ruđer Bošković Institute, Bijenička Cesta 54, 10000 Zagreb, Croatia
Abhiram Kaushik: Centre for Informatics and Computing, Ruđer Bošković Institute, Bijenička Cesta 54, 10000 Zagreb, Croatia
Dario Živković: Centre for Informatics and Computing, Ruđer Bošković Institute, Bijenička Cesta 54, 10000 Zagreb, Croatia
Davor Davidović: Centre for Informatics and Computing, Ruđer Bošković Institute, Bijenička Cesta 54, 10000 Zagreb, Croatia

Mathematics, 2025, vol. 13, issue 22, 1-21

Abstract: The QR factorisation is a cornerstone of numerical linear algebra, essential for solving overdetermined linear systems, eigenvalue problems, and various scientific computing tasks. However, computing it for ill-conditioned tall-and-skinny (TS) matrices on large-scale distributed-memory systems, particularly those with multiple GPUs, presents significant challenges in balancing numerical stability, high performance, and efficient communication. Traditional Householder-based QR methods provide numerical stability but perform poorly on TS matrices due to their reliance on memory-bound kernels. This paper introduces a novel algorithm for computing the QR factorisation of ill-conditioned TS matrices based on CholeskyQR methods. Although CholeskyQR is fast, it typically fails due to severe loss of orthogonality for ill-conditioned inputs. To solve this, our new algorithm, mCQRGSI+, combines the speed of CholeskyQR with stabilising techniques from the Gram–Schmidt process. It is specifically optimised for distributed multi-GPU systems, using adaptive strategies to balance computation and communication. Our analysis shows the method achieves accuracy comparable to Householder QR, even for extremely ill-conditioned matrices (condition numbers up to 10 16 ). Scaling experiments demonstrate speedups of up to 12 × over ScaLAPACK and 16 × over SLATE’s CholeskyQR2. This work delivers a method that is both robust and highly parallel, advancing the state-of-the-art for this challenging class of problems.

Keywords: QR factorisation; CholeskyQR; Gram–Schmidt orthogonalisation; tall-and-skinny matrices; ill-conditioned matrices; graphic processing units; distributed systems; parallel algorithms (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/13/22/3608/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/22/3608/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:22:p:3608-:d:1791841

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-11-20
Handle: RePEc:gam:jmathe:v:13:y:2025:i:22:p:3608-:d:1791841