Workshop: The 12th Annual International Workshop on Innovating the Network for Data-Intensive Science (INDIS)
Authors: Anna Giannakou, Jonathan Skone, Vinay Sawal, Ronal Kumar, Stephen Simms, Nicholas Wright, and Lavanya Ramakrishnan (Lawrence Berkeley National Laboratory (LBNL))
Abstract: High-performance computing (HPC) datacenters must simultaneously support real-time data streams with sub-millisecond latency and bulk transfers requiring sustained multi-gigabit throughput—demands that compete for the same network resources. End-to-end performance guarantees are therefore essential, typically delivered through Quality of Service (QoS) mechanisms that classify traffic, reserve bandwidth, and enforce priorities across all network hops. While backbone and wide-area network providers already implement QoS, the local Ethernet ingress “last-mile” inside HPC facilities generally remains best-effort, creating a critical blind spot where latency builds and time-sensitive workflows can suffer. We address this gap with a standards-based Differentiated Services Code Point (DSCP) QoS configuration on existing leaf–spine switches: packets are marked at the host, queued per traffic class, and shaped on every hop through to the high-speed network (HSN) gateway NIC. Experiments on both intra-domain and inter-domain traffic show up to 60 percent more stable throughput and 30 percent fewer retransmissions, without hardware upgrades.
Back to The 12th Annual International Workshop on Innovating the Network for Data-Intensive Science (INDIS) Archive Listing Back to Full Workshop Archive Listing