Poster Type: ACM Student Research Competition, Undergraduate
Author: Vicki Carrica (Massachusetts Institute of Technology (MIT))
Supervisor: Rabab Alomairy (Massachusetts Institute of Technology (MIT))
Abstract: The Cholesky decomposition is a critical performance bottleneck in engineering simulations. To accelerate these simulations, we present a novel, nested recursive Cholesky algorithm implemented in Julia. The algorithm restructures the problem into recursive TRSM (triangular solve) and SYRK (symmetric rank-k update) sub-problems, maximizing the use of highly parallel GEMM (general matrix-matrix multiply) operations that are highly efficient on GPUs. This approach leverages a custom recursive data structure that enables layered, mixed-precision arithmetic on modern NVIDIA H200 GPUs. By strategically using fast, low-precision FP16 computations on large, off-diagonal matrix blocks via Tensor Cores, while preserving high-precision on the critical diagonal blocks, we achieve a speedup of 5.32x over the standard cuSOLVER FP64 implementation. This method is 100x more accurate than a pure FP16 approach while retaining over 88\% of its speedup. Our work demonstrates a practical path to significantly reducing computation time for large-scale scientific problems with minimal accuracy loss.
Best Poster Finalist (BP): no
Poster: PDF
Poster Summary: PDF