Detecting Disaster Before It Strikes: On the Challenges of Automated Building and Testing in HPC Environments

Feld, Christian; Geimer, Markus; Hermanns, Marc-André; Saviankou, Pavel; Visser, Anke; Mohr, Bernd

Detecting Disaster Before It Strikes: On the Challenges of Automated Building and Testing in HPC Environments

Christian Feld (), Markus Geimer (), Marc-André Hermanns (), Pavel Saviankou (), Anke Visser () and Bernd Mohr ()
Additional contact information
Christian Feld: Jülich Supercomputing Centre, Forschungszentrum Jülich GmbH
Markus Geimer: Jülich Supercomputing Centre, Forschungszentrum Jülich GmbH
Marc-André Hermanns: Jülich Supercomputing Centre, Forschungszentrum Jülich GmbH
Pavel Saviankou: Jülich Supercomputing Centre, Forschungszentrum Jülich GmbH
Anke Visser: Jülich Supercomputing Centre, Forschungszentrum Jülich GmbH
Bernd Mohr: Jülich Supercomputing Centre, Forschungszentrum Jülich GmbH

A chapter in Tools for High Performance Computing 2018 / 2019, 2021, pp 3-26 from Springer

Abstract: Abstract Software reliability is one of the cornerstones of any successful user experience. Software needs to build up the users’ trust in its fitness for a specific purpose. Software failures undermine this trust and add to user frustration that will ultimately lead to a termination of usage. Even beyond user expectations on the robustness of a software package, today’s scientific software is more than a temporary research prototype. It also forms the bedrock for successful scientific research in the future. A well-defined software engineering process that includes automated builds and tests is a key enabler for keeping software reliable in an agile scientific environment and should be of vital interest for any scientific software development team. While automated builds and deployment as well as systematic software testing have become common practice when developing software in industry, it is rarely used for scientific software, including tools. Potential reasons are that (1) in contrast to computer scientists, domain scientists from other fields usually never get exposed to such techniques during their training, (2) building up the necessary infrastructures is often considered overhead that distracts from the real science, (3) interdisciplinary research teams are still rare, and (4) high-performance computing systems and their programming environments are less standardized, such that published recipes can often not be applied without heavy modification. In this work, we will present the various challenges we encountered while setting up an automated building and testing infrastructure for the Score-P, Scalasca, and Cube projects. We will outline our current approaches, alternatives that have been considered, and the remaining open issues that still need to be addressed—to further increase the software quality and thus, ultimately improve user experience.

Date: 2021
References: Add references at CitEc
Citations:

There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:sprchp:978-3-030-66057-4_1

Ordering information: This item can be ordered from
http://www.springer.com/9783030660574

DOI: 10.1007/978-3-030-66057-4_1

Access Statistics for this chapter

More chapters in Springer Books from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().