The International Conference for High Performance Computing, Networking, Storage, and Analysis

Workshops Archive

Scalable, High-Fidelity Monitoring of Application Communication Patterns in Vernier


Workshop: 7th Workshop on Programming and Performance Visualization Tools (ProTools)

Authors: Jered Dominguez-Trujillo (University of New Mexico, Los Alamos National Laboratory (LANL)); Derek Schafer (University of New Mexico); Riley Shipley (Tennessee Tech University); Ryan Marshall (Los Alamos National Laboratory (LANL)); Nicholas Bacon (University of New Mexico); Maxim Moraru and Galen Shipman (Los Alamos National Laboratory (LANL)); Anthony Skjellum (Tennessee Tech University); and Patrick Bridges (University of New Mexico)

Abstract: Understanding the irregular, dynamic communication patterns in HPC applications at scale is critical when evaluating potential software optimizations and hardware architectures. Current systems monitor communication behavior for entire applications as exhaustive traces or general-purpose aggregated statistics. Generally, these approaches often do not scale well and the data gathered is often too generic or inflexible to make specific hardware/software optimizations. This paper describes a new, configurable, histogram-based approach to gathering scalable, high-fidelity monitoring information about HPC communication that we implemented in the Vernier communication monitoring system. This approach enables targeted collection of statistical data about annotated communication patterns for online or offline analysis, benchmarking, or network simulations. We assess these capabilities by collecting communication patterns from several production HPC applications at scale, showing that the resulting statistical representations accurately characterize the communication patterns in these applications, and can be used to provide new insights into communication patterns of complex HPC applications.


Back to 7th Workshop on Programming and Performance Visualization Tools (ProTools) Archive Listing Back to Full Workshop Archive Listing