Workshop: IA^3 2025 — 15th Workshop on Irregular Applications: Architectures and Algorithms
Authors: Oded Green (NVIDIA Corporation, Georgia Institute of Technology); Joe Eaton (NVIDIA Corporation); Alok Tripathy (UC Berkeley); and Corey Nolet and Justin Luitjens (NVIDIA Corporation)
Abstract: In this paper, we introduce a new algorithm for generating large-scale permutations on distributed systems. Permutations get used in many applications, including statistical analysis, machine learning, sampling, graph neural networks, matching, crypto-analysis, and bootstrapping. In data science, the permutation is also commonly referred to as a shuffle operation as it reorganizes elements in an entirely random manner by applying the permutation. Our algorithm is computationally efficient, easy to understand, and scales to large systems. We measure the performance of our new permutation generation scheme on a cluster of NVIDIA DGX-A100s, using up to 256 NVIDIA A100 GPUs. We show that we can generate a permutation of 137 billion values in approximately 1.1 seconds, with a throughput of 124 billion elements per second.
Back to IA^3 2025 — 15th Workshop on Irregular Applications: Architectures and Algorithms Archive Listing Back to Full Workshop Archive Listing