SC25 Proceedings

Workshops Archive

Accelerating Intra-Node GPU Communication: A Performance Model for Multi-Path Transfers

Workshop: ExaMPI25: Workshop on Extreme Scale MPI

Authors: Amirhossein Sojoodi (Queens University) and Mohammad Akbari, Hamed Sharifian, Ali Farazdaghi, Ryan E. Grant, and Ahmad Afsahi (Queen's University)

Abstract: Optimizing GPU-to-GPU communication is a key challenge for improving performance in MPI-based HPC applications, especially when utilizing multiple communication paths. This paper presents a novel performance model for intra-node multi-path GPU communication within the MPI+UCX framework, aimed at determining the optimal configuration for distributing a single P2P communication across multiple paths. By considering factors such as link bandwidth, pipeline overhead, and stream synchronization, the model identifies an efficient path distribution strategy, reducing communication overhead and maximizing throughput. Through extensive experiments on various topologies, we demonstrate that our model accurately finds theoretically optimal configurations, achieving significant improvements in performance, with the average of less than 6\% error in predicting the optimal configuration for very large messages.

Back to ExaMPI25: Workshop on Extreme Scale MPI Archive Listing Back to Full Workshop Archive Listing