Author: Ian Lumsden (University of Tennessee, Knoxville)
Advisor: Michela Taufer (University of Tennessee, Knoxville)
Abstract: As scientific applications tackle more complex problems, data movement has also grown in complexity to the point of slowing execution time and compromising time-to-solution, hindering the pace of scientific discovery. In this work, we claim that, to continue to accelerate scientific discovery in the exascale era and beyond, we need a general-purpose, adaptable analytic framework for optimizing data movement in both monolithic and modular workflow-based applications. To design this framework, we study data movement across three diverse HPC applications, deriving three key lessons learned that guide the optimization of application I/O. First, profile-level performance analysis can be extended to reveal detailed data movement patterns. Second, middleware can substantially improve data movement efficiency for workflows by aligning I/O with workflow execution patterns. Third, matching I/O phases to targeted storage systems can yield substantial performance gains, but requires phase-aware monitoring and tuning. We use these lessons learned to design features—fine-grained I/O filtering, middleware-level workflow analysis, and dynamic phase-to-storage mapping—that we integrate into the general-purpose Analytics4X (A4X) framework to optimize performance across a wide range of applications and I/O patterns.
Thesis Canvas: pdf