Poster Type: Research Posters
Author: Sowmya Yellapragada (University of Utah), Jessica Imlau Dagostini (University of California, Santa Cruz), Kevin Gott (Lawrence Berkeley National Laboratory (LBNL)), Rebecca Hartman-Baker (Lawrence Berkeley National Laboratory (LBNL))
Supervisor:
Abstract: Modern supercomputing systems exhibit heterogeneous node configurations, where seemingly identical hardware exhibits significant performance variations due to memory capacity differences, manufacturing tolerances, and deployment conditions. This heterogeneity impacts the efficiency of scientific applications built on frameworks like AMReX, leading to substantial computational waste on leadership-class systems. We present performance-aware and relation-aware load balancing algorithms specifically designed for scientific applications, like AMReX on heterogeneous HPC clusters. Our approach uses empirically measured node performance characteristics and a relative performance matrix to optimize task distribution across diverse computational resources.
Evaluation of NERSC Perlmutter with 14 representative AMReX computational kernels demonstrates 99.9% scheduling efficiency, achieving performance improvements of 4.4%-11.5% over traditional methods in moderate heterogeneity scenarios (A100 40GB vs. 80GB) and up to 300x improvements in extreme CPU-GPU mixed configurations where homogeneous methods fail to utilize CPU resources effectively. The algorithms handle million-task workloads with O(nlogn + nm) complexity while maintaining practical deployment feasibility.
Best Poster Finalist (BP): no
Poster: PDF
Poster Summary: PDF