The International Conference for High Performance Computing, Networking, Storage, and Analysis

Research and ACM SRC Posters Archive

Accelerating Linear Solve with Mixed Precision Nested Recursive Subdivision on AI Hardware


Poster Type: ACM Student Research Competition, Undergraduate

Author: Vicki Carrica (Massachusetts Institute of Technology (MIT))

Supervisor: Rabab Alomairy (Massachusetts Institute of Technology (MIT))

Abstract: The Cholesky decomposition is a critical performance bottleneck in engineering simulations. To accelerate these simulations, we present a novel, nested recursive Cholesky algorithm implemented in Julia. The algorithm restructures the problem into recursive TRSM (triangular solve) and SYRK (symmetric rank-k update) sub-problems, maximizing the use of highly parallel GEMM (general matrix-matrix multiply) operations that are highly efficient on GPUs. This approach leverages a custom recursive data structure that enables layered, mixed-precision arithmetic on modern NVIDIA H200 GPUs. By strategically using fast, low-precision FP16 computations on large, off-diagonal matrix blocks via Tensor Cores, while preserving high-precision on the critical diagonal blocks, we achieve a speedup of 5.32x over the standard cuSOLVER FP64 implementation. This method is 100x more accurate than a pure FP16 approach while retaining over 88\% of its speedup. Our work demonstrates a practical path to significantly reducing computation time for large-scale scientific problems with minimal accuracy loss.

Best Poster Finalist (BP): no
Poster: PDF
Poster Summary: PDF


Back to Poster Archive Listing