On the Detection and Interpretation of Performance Variations of HPC Applications
Dennis Hoppe (),
Li Zhong (),
Stefan Andersson () and
Diana Moise ()
Additional contact information
Dennis Hoppe: High Performance Computing Center Stuttgart
Li Zhong: High Performance Computing Center Stuttgart
Stefan Andersson: Amazon Web Services (AWS)
Diana Moise: Cray Inc.
A chapter in Sustained Simulation Performance 2018 and 2019, 2020, pp 41-56 from Springer
Abstract:
Abstract Supercomputers are synonymous with maximum performance, and thus one would expect that each run of an parallel applications would yield the same runtime provided that input parameters and data are unchanged. Practice, however, clearly demonstrates that this is not the case. Supercomputers are built with multi-user usage in mind, meaning that typically several hundred applications run simultaneously on a multitude of compute nodes. Although these compute nodes are assigned exclusively to users, network and data storage is shared among all; interferences between applications are inevitable. In this paper, we evaluate application runs on a Cray XC40 system. The objective is to identify so-called aggressor applications having a negative impact on the performance of simultaneously running applications resulting in unforeseeable longer runtimes. We discuss in this paper characteristics of aggressors and victims, as well as introduce several detection strategies to identify these victims, and thus also potential aggressors. Finally, a study demonstrates the effectiveness of the approach by identifying an aggressor and optimizing the source code, which resulted in less interference.
Date: 2020
References: Add references at CitEc
Citations:
There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:sprchp:978-3-030-39181-2_5
Ordering information: This item can be ordered from
http://www.springer.com/9783030391812
DOI: 10.1007/978-3-030-39181-2_5
Access Statistics for this chapter
More chapters in Springer Books from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().