The International Conference for High Performance Computing, Networking, Storage, and Analysis

Workshops Archive

On the Performance and Scalability of Cloud Supercomputers: Insights from Eagle and Reindeer


Workshop: PMBS25: The 16th International Workshop on Performance Modeling, Benchmarking, and Simulation of High-Performance Computer Systems

Authors: Amirreza Rastegari, Prabhat Ram, and Michael F. Ringenburg (Microsoft Corporation)

Abstract: Launch of Eagle, Azure’s hyper-scale supercomputer and the Number 3 on TOP500 list in November 2023, marked a new era where cloud providers are at the forefront of supercomputing. Despite its rapid expansion, public knowledge on the performance and scalability of cloud-based supercomputing is limited, with numerous misconceptions regarding performance implications due to virtualization layer of cloud-based systems. To address these gaps, we present a comparative analysis of two cloud-based supercomputers: Azure Eagle, a hyper-scale system ranked Number 3 on TOP500 in November 2023, and Azure Reindeer, a small-scale system ranked Number 32 on TOP500 in November 2024.

Using a comprehensive performance analysis, we highlight differences in computational efficiency and scaling characteristics of these systems in comparison to their bare-metal on-premises counterparts. We furthermore quantify the overhead from Azure's virtualization layer, demonstrating its performance implication for real-world HPC workloads to be less than 4%, with typical values ranging from 2–3%.


Back to PMBS25: The 16th International Workshop on Performance Modeling, Benchmarking, and Simulation of High-Performance Computer Systems Archive Listing Back to Full Workshop Archive Listing