Workshop: IA^3 2025 — 15th Workshop on Irregular Applications: Architectures and Algorithms
Authors: Bishal Sharma and Martin Burtscher (Texas State University)
Abstract: With the availability of sophisticated profiling tools for GPUs such as NVIDIA’s Nsight Compute and Nsight Systems, programmers tend to overlook the level of insight that can be gained from simple profiling techniques. For instance, the basic profiling approach of manually adding counters to source code is able to expose important application-specific behavior that general-purpose profilers cannot capture. Analyzing global or thread-local counts of certain events can help developers better reason about program behaviors that are crucial for detecting performance bottlenecks, validating key assumptions, and guiding effective optimizations. In this paper, we demonstrate on the example of 5 high-performance GPU graph-analytics codes how we used this profiling approach to uncover interesting application behaviors and to develop performance optimizations based on some of them.
Back to IA^3 2025 — 15th Workshop on Irregular Applications: Architectures and Algorithms Archive Listing Back to Full Workshop Archive Listing