Workshop: WORKS 2025: 20th Workshop on Workflows in Support of Large-Scale Science
Authors: Marco Edoardo Santimaria (University of Turin); Rosa Filgueira (Edinburgh Parallel Computing Centre (EPCC)); and Doriana Medić, Iacopo Colonnelli, and Marco Aldinucci (University of Turin)
Abstract: This work introduces a novel double-sided streaming methodology that combines control-plane and data-plane streaming. Our goal is to implement the long-advocated separation of concerns in workflow orchestration without introducing artificial boundaries in their execution. Our approach is exemplified by the integration of control-plane streaming provided by dispel4py and the transparent data-plane streaming provided by CAPIO. Our integration eliminates file synchronization barriers without requiring modifications to existing workflow logic. To support this, we extend CAPIO with a new commit rule that allows streaming over dynamically generated file sets, enabling hybrid workflows that blend in-memory dataflows with file-based communication. We validate our approach using a real-world seismic cross-correlation workflow, achieving performance improvements between 23% and 40%. Unlike previous solutions, our method supports streaming across the entire workflow, including phase boundaries where file I/O would typically enforce strict execution ordering. Therefore, our approach can be straightforwardly extended to other multi-stage streaming applications.
Back to WORKS 2025: 20th Workshop on Workflows in Support of Large-Scale Science Archive Listing Back to Full Workshop Archive Listing