SC25 Proceedings

Workshops Archive

MPI Communication Performance on AMD MI300A: Microbenchmarks and Applications

Workshop: 7th International Workshop on Emerging Parallel Distributed Runtime Systems and Middleware

Authors: Goutham Kalikrishna Reddy Kuncham (The Ohio State University) and Siyuan Zhang, Shoaib Mohammad, Chen-Chun Chen, and Dhabaleswar K. Panda (Ohio State University)

Abstract: AMD’s MI300A integrates CPU and GPU chiplets around a shared HBM3 pool, removing the traditional host-device boundary and changing assumptions in GPU-aware MPI. Despite early deployments, there is little guidance on how mainstream MPI libraries behave on this architecture. This evaluation paper presents a comparative study of MVAPICH-Plus, Open MPI, MPICH, and Cray MPICH on MI300A APU nodes. We measure point-to-point performance on CPU and GPU buffers, reporting intra-node and inter-node latency, unidirectional bandwidth, and bidirectional bandwidth across various message sizes. We then examine collectives, covering reduction-based and data-movement-based operations, and analyze scaling behavior. Finally, we connect microbenchmark trends to application results using OpenFOAM and Distributed training of a large language model (LLM) with Pytorch. The study distills practical guidance and highlights opportunities for MI300A-aware optimizations in MPI.

Back to 7th International Workshop on Emerging Parallel Distributed Runtime Systems and Middleware Archive Listing Back to Full Workshop Archive Listing