Workshop: PMBS25: The 16th International Workshop on Performance Modeling, Benchmarking, and Simulation of High-Performance Computer Systems
Authors: Johannes Langguth (Simula Research Laboratory, University of Bergen); James Trotter (Simula Research Laboratory); and Xing Cai (University of Oslo, Simula Research Laboratory)
Abstract: Memory bandwidth has become the primary limiting factor of performance in many modern HPC applications, and it poses a limit to scalability because the achievable memory
bandwidth only grows linearly with a small number of CPU cores. When the number of cores concurrently using the memory system exceeds a threshold, the aggregate memory bandwidth quickly saturates.
To estimate the time usage of a computation dominated by memory
traffic, the mainstream strategy is to divide the expected total memory
traffic volume by the maximum memory bandwidth. However, this
implicitly assumes homogeneous memory traffic which is often not the case, leading to inaccurate time estimates by
the mainstream strategy.
In this paper, we present a new performance model that specifically
targets inhomogeneity in per-core memory traffic.
The new requires only three hardware parameters. Using several cases of uneven per-core memory traffic, we demonstrate its advantage
over the mainstream strategy.
Back to PMBS25: The 16th International Workshop on Performance Modeling, Benchmarking, and Simulation of High-Performance Computer Systems Archive Listing Back to Full Workshop Archive Listing