High-Performance Computing in Deep Learning: Distributed Training Strategies for Transformer Models in Natural Language Processing
Xuchen Sun
Simen Owen Academic Proceedings Series, 2026, vol. 3, 188-197
Abstract:
Distributed training of large Transformer models is increasingly conducted on heterogeneous high-performance computing (HPC) clusters, where variability in compute capacity and network topology degrades efficiency and stability. Existing systems rely on static partitioning or uniform gradient compression, leading to communication bottlenecks, suboptimal convergence, and poor fault tolerance. To address these limitations, we propose an adaptive distributed training framework that integrates topology-aware model placement, layer-wise adaptive sparsification based on gradient variance, and error feedback with hybrid parallelism. Evaluated on a 1.3-billion-parameter Transformer across 32 GPUs (including RTX 4090 and V100), our method achieves a throughput of 2,268 ± 29 samples/sec (23.1% higher than Megatron-LM) and reduces time to target validation loss (
Keywords: distributed training; transformer models; heterogeneous clusters; gradient sparsification; fault tolerance (search for similar items in EconPapers)
Date: 2026
References: Add references at CitEc
Citations:
Downloads: (external link)
https://soapubs.com/index.php/SOAPS/article/view/1596/1460 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:axf:soapsa:v:3:y:2026:i::p:188-197
Access Statistics for this article
More articles in Simen Owen Academic Proceedings Series from Scientific Open Access Publishing
Bibliographic data for series maintained by Yuchi Liu ().