Workshop: 2025 International Workshop on Performance, Portability, and Productivity in HPC (P3HPC)
Authors: Oscar Antepara, Leonid Oliker, and Samuel Williams (Lawrence Berkeley National Laboratory (LBNL))
Abstract: The introduction of tightly-coupled heterogeneous architectures, such as AMD's MI300A and NVIDIA's Grace-Hopper(GH200), address a bottleneck in accelerated computing, namely the CPU-GPU interface. Whereas the GH200 can be seen as a technological leap in CPU-GPU connectivity greatly exceeding PCIe cadence, the unified memory architecture of the MI300A APU enables seamless communication through coherent caches. When the CPU and GPU execute concurrently, they contend not only for finite bandwidth, but also contend power in a power-constrained environment. In this paper, we extend the well-established Roofline model to capture the performance implications of contention in concurrent execution on the MI300A and GH200. We enhance this by noting the impact of different memory allocators, the randomness of data, and the host and device arithmetic intensity. We conclude with a discussion on the evolution of GPU architectures and the impact in performance, portability, and programmability that emerging tightly-coupled GPUs bring to the HPC landscape.
Back to 2025 International Workshop on Performance, Portability, and Productivity in HPC (P3HPC) Archive Listing Back to Full Workshop Archive Listing