A Tool for Runtime Analysis of Performance and Energy Usage in NUMA Systems
M. L. Becoña (),
O. G. Lorenzo (),
T. F. Pena (),
J. C. Cabaleiro (),
F. F. Rivera () and
J. A. Lorenzo ()
Additional contact information
M. L. Becoña: University of Santiago de Compostela, CiTIUS Centro de Investigación en Tecnoloxías Intelixentes
O. G. Lorenzo: University of Santiago de Compostela, CiTIUS Centro de Investigación en Tecnoloxías Intelixentes
T. F. Pena: University of Santiago de Compostela, CiTIUS Centro de Investigación en Tecnoloxías Intelixentes
J. C. Cabaleiro: University of Santiago de Compostela, CiTIUS Centro de Investigación en Tecnoloxías Intelixentes
F. F. Rivera: University of Santiago de Compostela, CiTIUS Centro de Investigación en Tecnoloxías Intelixentes
J. A. Lorenzo: ETIS Laboratory, CY Cergy Paris Université
A chapter in Tools for High Performance Computing 2018 / 2019, 2021, pp 85-99 from Springer
Abstract:
Abstract Multicore systems present on-board memory hierarchies and communication networks that influence performance when executing shared-memory parallel codes. Characterising this influence is complex, and understanding the effect of particular hardware configurations on different codes is of paramount importance. In this context, precise monitoring information can be extracted from hardware counters (HC) at runtime to characterise the behaviour of each thread of a parallel code. This technology provides high accuracy with a low overhead. In particular, we introduce a new tool to get this information from hardware counters in terms of number of floating point operations per second, operational intensity, latency of memory access, and energy consumption. Note the first two parameters define the well-known Roofline Model, an intuitive visual performance model used to provide performance estimates of applications running on multi-core architectures. The third parameter quantifies data locality and the fourth one is related to the load of each node of the system. All this information is accessed through the perf $$\_$$ _ events interface provided by Linux, with the aid of the libpfm library. This tool can be used to utilise its monitoring information to optimise execution efficiency in NUMA systems by balancing or scheduling the workloads, guiding thread and page migration strategies in order to increase locality and affinity. The designated migrations are based on optimisation strategies, supported by runtime information provided by hardware counters. Overall, the profiling application is launched from a terminal as a background process, it does not require superuser permissions to run properly, and can lead to performance optimization in multithreaded applications and power saving in NUMA systems.
Keywords: Roofline model; Performance; Hardware counters; PEBS; Energy usage (search for similar items in EconPapers)
Date: 2021
References: Add references at CitEc
Citations:
There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:sprchp:978-3-030-66057-4_4
Ordering information: This item can be ordered from
http://www.springer.com/9783030660574
DOI: 10.1007/978-3-030-66057-4_4
Access Statistics for this chapter
More chapters in Springer Books from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().