SC25 Proceedings

Research and ACM SRC Posters Archive

Divide, Conquer, and Denoise: Hybrid Parallel Diffusion with Memory-Aware Coarse-to-Fine Inference

Poster Type: Research Posters

Author: Farhana Amin (Virginia Tech), Kanchon Gharami (Embry-Riddle Aeronautical University), Dimitrios Nikolopoulos (Virginia Tech)

Supervisor:

Abstract: Diffusion models create high-quality images but are slow because denoising steps run in sequence. We present a hybrid parallel diffusion framework speeding up generation on mixed-capacity GPUs while keeping images coherent. First, we split each image into patches sized by each GPU’s memory (i.e., memory-aware partitioning), so stronger devices handle more work and weaker ones are not overloaded. Second, we build a fast, low-resolution preview of the full image and use it to guide every patch, preventing seams and preserving global structure. Third, we parallelize time with a parareal strategy: a coarse pass provides guesses, fine solvers refine segments in parallel, and corrections align results. While GPUs compute, they share boundary pixels asynchronously to hide communication. Finally, cosine-weighted blending stitches patches into a seamless output. Early tests show lower idle time, better scaling, and consistent quality on images.

Best Poster Finalist (BP): no
Poster: PDF
Poster Summary: PDF

Back to Poster Archive Listing