SC25 Proceedings

Research and ACM SRC Posters Archive

Sync-Free GPU Parallelization of Sparse Kernels from Sequential Python Code

Poster Type: ACM Student Research Competition, Undergraduate

Author: Malko-Bani Somo (McMaster University)

Supervisor: Kazem Cheshmi (McMaster University)

Abstract: Sparse matrix kernels such as SpMV, SpTRSV, and Gauss-Seidel are critical in scientific computing, AI, and engineering, but they remain difficult to parallelize due to irregular memory access patterns. Traditional compiler techniques assume affine array accesses, which do not hold in sparse formats like CSR and CSC. As a result, existing compilers often leave sparse code under-optimized, missing significant opportunities for parallelism.

We present a sync-free, runtime-based transformation that automates loop parallelization for sparse kernels with loop-carried dependencies. Our approach traces memory reads and writes to construct dependence sets, then generates Triton kernels that use flag arrays to enforce correctness without global synchronization. This method generalizes across sparse kernels by leveraging properties such as associativity and affine simplifications, enabling efficient parallel execution.

We demonstrate our work with sparse triangular solves and related kernels, and will present performance results, methodology, and case studies in the poster session.

Best Poster Finalist (BP): no
Poster: PDF
Poster Summary: PDF

Back to Poster Archive Listing