Workshop: LLVM-HPC2025: The 11th Workshop on the LLVM Compiler Infrastructure in HPC
Authors: Ivan R. Ivanov (Institute of Science Tokyo, RIKEN Center for Computational Science (R-CCS)); Jens Domke (RIKEN Center for Computational Science (R-CCS)); Toshio Endo (Institute of Science Tokyo); and Johannes Doerfert (Lawrence Livermore National Laboratory (LLNL))
Abstract: Thread coarsening is a well known optimization technique for GPUs. It enables instruction-level parallelism, reduces redundant computation, and can provide better memory access patterns. However, the presence of divergent control flow - cases where uniformity of branch conditions among threads cannot be proven at compile time - diminishes its effectiveness. In this work, we implement multi-level thread coarsening for CPU and GPU OpenMP code, by implementing a generic thread coarsening transformation on LLVM IR. We introduce dynamic convergence - a new technique that generates both coarsened and non-coarsened versions of divergent regions in the code and allows for the uniformity check to happen at runtime instead of compile time. We performed evalution on HecBench for GPU and LULESH for CPU. We found that best case speedup without dynamic convergence was 4.6% for GPUs and 2.9% for CPUs, while we achieved 7.5% for GPUs and 4.3% for CPUs with it on.
Back to LLVM-HPC2025: The 11th Workshop on the LLVM Compiler Infrastructure in HPC Archive Listing Back to Full Workshop Archive Listing