Poster Type: Research Posters
Author: Wenyi Wang (University of Chicago), Maxime Gonthier (University of Chicago), Haibin Lai (Southern University of Science and Technology), Poornima Nookala (Intel Corporation), Haochen Pan (University of Chicago), Ian Foster (University of Chicago), Ioan Raicu (Illinois Institute of Technology), Kyle Chard (University of Chicago)
Supervisor:
Abstract: High synchronization overhead in frameworks like GNU OpenMP impedes fine-grained task parallelism on many-core architectures. We introduce three advances to GNU OpenMP: a lock-less concurrent queue (XQueue), a scalable distributed tree barrier, and two NUMA-aware, lock-less load-balancing strategies.
Evaluated with Barcelona OpenMP Task Suite (BOTS) benchmarks, our XQueue and tree barrier improve performance by up to 1522.8× over the original GNU OpenMP. The load-balancing strategies provide an additional performance improvement of up to 4×.
We further apply these techniques to the TaskFlow runtime, demonstrating performance and scalability gains in selected applications while also analyzing the inherent limitations of the lock-less approach on x86 architectures.
Best Poster Finalist (BP): no
Poster: PDF
Poster Summary: PDF