Workshop: The 8th Annual Parallel Applications Workshop: Alternatives To MPI+X (PAW-ATM 2025)
Authors: Baodi Shan (Stony Brook University); Mauricio Araya-Polo (TotalEnergies EP Research & Technology US, LLC); and Barbara Chapman (Stony Brook University, Hewlett Packard Enterprise (HPE))
Abstract: High-performance computing faces rising core counts, increasing heterogeneity, and growing memory bandwidth. These trends complicate programmability, portability, and scalability, while traditional MPI + OpenMP struggles with distributed GPU memory and portable performance.
We present DiOMP-Offloading, a framework unifying OpenMP target offloading with a Partitioned Global Address Space (PGAS) model. Built on LLVM-OpenMP and GASNet-EX, it centrally manages global memory and supports symmetric/asymmetric GPU allocations, enabling remote put/get operations. DiOMP also integrates OMPCCL, a portable device-side collective layer that harmonizes allocation lifecycles and address translation across vendor backends.
By eliminating separate MPI + X stacks and abstracting replicated device memory and communication logic, DiOMP improves scalability and programmability. Experiments on large-scale NVIDIA A100, Grace Hopper, and AMD MI250X platforms show superior micro-benchmark and application performance, demonstrating that DiOMP-Offloading offers a more portable, scalable, and efficient path for heterogeneous supercomputing.
Back to The 8th Annual Parallel Applications Workshop: Alternatives To MPI+X (PAW-ATM 2025) Archive Listing Back to Full Workshop Archive Listing