The International Conference for High Performance Computing, Networking, Storage, and Analysis

Workshops Archive

Multi-rail RoCE, Now with more BGP!


Workshop: HPC Systems Professionals Workshop (HPCSYSPROS25)

Authors: Benjamin Matthews (National Science Foundation National Center for Atmospheric Research (NSF NCAR))

Abstract: Heterogeneous compute nodes containing multiple accelerators and Ethernet network injections have become common in recent years. Despite this, additional network injections beyond the first are often only utilized by application middleware such as MPI or NCCL supporting an RDMA API. We explain why traditional Etherchannel can't support this usecase. We further propose an alternative network configuration which allows these hardware resources to be utilized both by RDMA application middleware such as MPI as well as other applications which utilize the OS provided sockets API rather than a kernel bypass API. This allows user applications using less HPC focused (but potentially more portable) APIs as well as parallel filesystems and other tools to also benefit from the additional networking hardware available in this type of compute node.


Back to HPC Systems Professionals Workshop (HPCSYSPROS25) Archive Listing Back to Full Workshop Archive Listing