The International Conference for High Performance Computing, Networking, Storage, and Analysis

Workshops Archive

Stackless vs. Stackful Coroutines: A Comparative Study for RDMA-based Asynchronous Many-Task (AMT) Runtimes


Workshop: The 8th Annual Parallel Applications Workshop: Alternatives To MPI+X (PAW-ATM 2025)

Authors: Mia Reitz (University of Kassel) and Jonas Posner (Fulda University of Applied Sciences)

Abstract: Asynchronous Many-Task (AMT) runtimes manage parallelism by suspending and migrating tasks between processes, with their state captured in continuations. The efficiency of suspending, migrating, and resuming these continuations is critical to application performance.

This work directly compares stackful and stackless coroutines as continuation implementations in a cluster environment using RDMA-based coordinated work stealing. We implement and evaluate two functionally equivalent AMT runtimes for a fine-grained, recursive workload: one using traditional stackful coroutines, and another using C++20 stackless coroutines.

Our results show that both approaches yield nearly identical overall performance for small-state tasks. Stackful coroutines are created 2.4x faster, while stackless coroutines switch context 3.5x faster and have smaller frames. However, the smaller frame size of stackless coroutines does not significantly reduce communication time, which is dominated by network latency. We conclude that both coroutine types are viable, with stackless coroutines offering advantages as task state increases.


Back to The 8th Annual Parallel Applications Workshop: Alternatives To MPI+X (PAW-ATM 2025) Archive Listing Back to Full Workshop Archive Listing