The International Conference for High Performance Computing, Networking, Storage, and Analysis

Workshops Archive

Bridging the Gap: User-Centric Energy Monitoring for Policy-Driven Application Optimization in HPC Data Centers


Workshop: Sustainable Supercomputing

Authors: Woong Shin (Oak Ridge National Laboratory (ORNL)); Karl W. Schulz (Advanced Micro Devices, Inc. (AMD)); Arthur F. Lorenzon (Federal University of Rio Grande do Sul); Matthias Maiterth (Oak Ridge National Laboratory (ORNL)); Bruno Villasenor Alvarez and Jordà Polo (Advanced Micro Devices, Inc. (AMD)); Aditya Kashi, Hao Lu, Nicholson Koukpaizan, Antigoni Georgiadou, Matthew Norman, Wael Elwasif, Michael Matheson, and Feiyi Wang (Oak Ridge National Laboratory (ORNL)); Nicholas Frontiere (Argonne National Laboratory (ANL)); and Sarp Oral, Thomas Beck, and Bronson Messer (Oak Ridge National Laboratory (ORNL))

Abstract: Application energy optimization in HPC data centers face two critical gaps. Systematic methodologies that connect data center policies to application decisions and accessible monitoring tools that enable data-driven optimization. We address both gaps through two complementary pillars. First, we present a methodology based on extended weighted Energy Delay Product (EDP) to translate data center operational priorities and integrate energy considerations into the energy optimization workflow which starts from continuous monitoring through targeted optimization. Second, we present a user-space monitoring tool, Omnistat, that enables this methodology by providing developers with direct access to actionable energy telemetry. Through deployment on the Frontier supercomputer and case studies exploring performance-energy trade-offs, we show how these pillars help energy as an integral optimization target for developers as active participants in data center efficiency.


Back to Sustainable Supercomputing Archive Listing Back to Full Workshop Archive Listing