SC25 Proceedings

Workshops Archive

Extending RAJA Parallel Programming Abstractions with Just-In-Time Optimization

Workshop: 2025 International Workshop on Performance, Portability, and Productivity in HPC (P3HPC)

Authors: John Bowen, Konstantinos Parasyris, David Beckingsale, Tal Ben-Nun, Thomas Stitt, and Giorgis Georgakoudis (Lawrence Livermore National Laboratory (LLNL))

Abstract: The prevalence of heterogeneous computing systems -- comprising both CPUs and GPUs -- has led to the adoption of performance portability programming models, such as RAJA. These models allow developers to write portable code that compiles ahead-of-time (AOT), unmodified for different backends, thus improving productivity and maintainability.

In this work, we explore the integration of just-in-time (JIT) optimization into portable programming models. Our work aims to improve performance with JIT optimization, without sacrificing portability or developer productivity.

We extend Proteus to support indirect kernel launching through RAJA's abstractions. Our evaluation with the RAJAPerf benchmark suite demonstrates promising speedups for both AMD and NVIDIA GPUs, with no slowdowns recorded for either backend. Specifically, we record speedups from $1.2\times$ up to $23\times$ on AMD MI250X and speedups from $1.1\times$ up to $15\times$ on NVIDIA V100, while preserving the performance portability and ease-of-use benefits of RAJA.

Back to 2025 International Workshop on Performance, Portability, and Productivity in HPC (P3HPC) Archive Listing Back to Full Workshop Archive Listing