SC25 Proceedings

Workshops Archive

Bridging FPGA and GPU over PCIe: A Low-Latency Communication Path using AVX-512

Workshop: 12th Workshop on Accelerator Programming and Directives (WACCPD 2025)

Authors: Michele Martinelli (National Institute for Nuclear Physics (INFN)); Carlotta Chiarini (National Institute for Nuclear Physics, Sapienza University of Rome); Andrea Biagioni (National Institute for Nuclear Physics); Paolo Cretaro (National Institute for Nuclear Physics (Currently Unaffiliated)); and Ottorino Frezza, Francesca Lo Cicero, Alessandro Lonardo, Pierpaolo Perticaroli, Francesco Simula, Luca Pontisso, Cristian Rossi, and Piero Vicini (National Institute for Nuclear Physics)

Abstract: We introduce a communication mechanism bridging accelerators like GPUs and PCIe-based FPGA devices using Programmed I/O as an alternative to Direct Memory Access data transmissions: less than 2 microseconds one-way latency for small message transfers is achieved when the FPGA operates as Network Interface Card (NIC). Our prototype employs APEnetX, a custom FPGA-based NIC, and a CPU engine that atomically writes descriptors and payloads directly into the PCIe device Memory Mapped region using AVX-512 instructions. Additionally, a GPU peer-to-peer remapping technique enables the injections of data packets from the GPU memory into the NIC Memory Mapped aperture with no DMA-orchestrated data movements by the CPU. Microbenchmarks show lower latency than traditional RDMA for small packets with a simpler software stack. This method is not limited to APEnetX: it applies to any FPGA-based NIC or accelerator exposing a PCIe-mapped control aperture, provided the device can read and transmit data from memory.

Back to 12th Workshop on Accelerator Programming and Directives (WACCPD 2025) Archive Listing Back to Full Workshop Archive Listing