Workshop: 2025 International Workshop on Performance, Portability, and Productivity in HPC (P3HPC)
Authors: Gabriel Suau (CEA Saclay), Thierry Gautier (Inria), Ansar Calloo and Rémi Baron (CEA Saclay), and Romain Le Tellier (CEA Cadarache)
Abstract: This paper describes the development of performance portable batched linear algebra kernels for SN-DG neutron transport sweeps using Kokkos. We establish a new sweep algorithm for GPUs that relies on batched linear algebra kernels. We implement an optimized batched gesv solver for small linear systems that builds upon state-of-the-art algorithms. Our implementation achieves high performance by minimizing global memory traffic and maximizing the amount of compu- tations done at compile-time. We assess the performance of the batched gesv kernel on NVIDIA and AMD GPUs. We show that our custom implementation outperforms state-of- the-art linear algebra libraries on these architectures. The performance of the new GPU sweep implementation is as- sessed on the H100 and MI300A GPUs. We demonstrate that our GPU implementation is able to achieve high performance on both architectures, and is competitive with an optimized multithreaded CPU implementation on a 128-core CPU.
Back to 2025 International Workshop on Performance, Portability, and Productivity in HPC (P3HPC) Archive Listing Back to Full Workshop Archive Listing