Workshop: ExHetAI: Extreme Heterogeneity and AI Convergence in HPC
Authors: Sahil Tyagi (Oak Ridge National Laboratory (ORNL)); Andrei Cozma (University of Tennessee, Knoxville); and Olivera Kotevska and Feiyi Wang (Oak Ridge National Laboratory (ORNL))
Abstract: Federated Learning (FL) is critical for edge and High Performance Computing (HPC) where data is not centralized and privacy is crucial. We present OmniFed, a modular framework designed around decoupling and clear separation of concerns for configuration, orchestration, communication, and training logic. Its architecture supports configuration-driven prototyping and code-level override-what-you-need customization. We also support different topologies, mixed communication protocols within a single deployment, and popular training algorithms. It also offers optional privacy mechanisms including Differential Privacy (DP), Homomorphic Encryption (HE), and Secure Aggregation (SA), as well as compression strategies. These capabilities are exposed through well-defined extension points, allowing users to customize topology and orchestration, learning logic, and privacy/compression plugins, all while preserving the integrity of the core system. We evaluate multiple models and algorithms to measure various performance metrics. By unifying topology configuration, mixed-protocol communication, and pluggable modules in one stack, OmniFed streamlines FL experimentation and deployment across heterogeneous environments.
Back to ExHetAI: Extreme Heterogeneity and AI Convergence in HPC Archive Listing Back to Full Workshop Archive Listing