Workshop: AI4S: 6th Workshop on Artificial Intelligence and Machine Learning for Scientific Applications
Authors: Austin Yunker, Weijian Zheng, and Rajkumar Kettimuthu (Argonne National Laboratory (ANL))
Abstract: In this paper, we propose inferCT, an efficient framework that enables 3D deep learning for computed tomography (CT) during inference. Our baseline approach addresses this issue by partitioning CT volumes into cubic sub-volumes that fit into GPU memory and distributing them across multiple GPUs. Building on this, we introduce further vendor-agnostic optimizations, including a lock-free shared memory data structure to reduce synchronization overhead, pipeline execution to hide data prefetching and post-processing latency, and a parallel data loader to improve I/O efficiency. Results on both AMD and NVIDIA GPUs show that our optimized framework achieves speedups of 1.97× and 2.32× over the baseline for the 10243 and 40963 datasets, respectively. For the scalability tests, experiments demonstrate strong scaling efficiencies of 89.25% and 75.75% when scaling from 1 to 4 GPUs within a single NUMA node, and from 1 to 8 GPUs across two NUMA nodes, respectively, using the 40963 dataset.
Back to AI4S: 6th Workshop on Artificial Intelligence and Machine Learning for Scientific Applications Archive Listing Back to Full Workshop Archive Listing