Fault Tolerant Molecular-Continuum Flow Simulation
Vahid Jafari,
Piet Jarmatz,
Helene Wittenberg,
Amartya Das Sharma,
Louis Viot,
Felix Maurer,
Niklas Wittmer and
Philipp Neumann ()
Additional contact information
Vahid Jafari: Helmut Schmidt University, Chair for High Performance Computing
Piet Jarmatz: Helmut Schmidt University, Chair for High Performance Computing
Helene Wittenberg: Helmut Schmidt University, Chair for High Performance Computing
Amartya Das Sharma: Helmut Schmidt University, Chair for High Performance Computing
Louis Viot: Helmut Schmidt University, Chair for High Performance Computing
Felix Maurer: Helmut Schmidt University, Chair for High Performance Computing
Niklas Wittmer: Helmut Schmidt University, Chair for High Performance Computing
Philipp Neumann: Helmut Schmidt University, Chair for High Performance Computing
A chapter in High Performance Computing in Science and Engineering '22, 2024, pp 463-475 from Springer
Abstract:
Abstract Molecular-continuum simulations couple molecular dynamics (MD) and computational fluid dynamics (CFD) simulations in a domain decomposition sense to assess fluid flow, e.g., in process engineering applications, at the nanoscale. Running these simulations on extreme-scale supercomputers, an issue consists in single compute cores or nodes failing due to hardware- or software-sided errors. This imposes a challenge to robustness of numerical simulations and, as such, also to molecular-continuum systems. We introduce a fault tolerance method in our macro-micro-coupling tool (MaMiCo) that has been developed in the past as molecular-continuum simulation software solution. With MaMiCo leveraging ensemble simulations to cope with statistical errors in the MD solutions, we extended the ensemble approach to recognize failing MPI processes and react to these failures. Once a failure is encountered, the affected MD simulations are removed from these MPI processes and relaunched on well-operating MPI process groups. We detail our approach and report scalability results for our approach, achieved on the supercomputer HAWK at HLRS.
Date: 2024
References: Add references at CitEc
Citations:
There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:sprchp:978-3-031-46870-4_30
Ordering information: This item can be ordered from
http://www.springer.com/9783031468704
DOI: 10.1007/978-3-031-46870-4_30
Access Statistics for this chapter
More chapters in Springer Books from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().