The International Conference for High Performance Computing, Networking, Storage, and Analysis

Research and ACM SRC Posters Archive

Energy-Efficient Multimodal LLM Inference: Stage-Level Characterization and Input-Aware Controls​


Poster Type: Research Posters

Author: Mona Moghadampanah (Virginia Tech), Adib Rezaei Shahmirzadi (Virginia Tech), Dimitrios S. Nikolopoulos (Virginia Tech)

Supervisor:

Abstract: Multimodal large language models (MLLMs) extend text-only LLMs with image and video encoders, enabling new capabilities but introducing high and poorly understood energy costs. This work characterizes the energy footprint of MLLM inference at the stage level, decomposing serving into vision encoding, prefill, and decoding for image–text models. Using NVML-based measurements on an NVIDIA A100 with realistic workloads, we demonstrate how encoder design and input complexity (resolution, image count) increase the number of visual tokens and shift energy toward prefill. Our novel contribution is linking token growth to serving inefficiency and demonstrating two practical controls: complexity-aware batching and stage-conditioned DVFS, which reduce energy while meeting latency SLOs. Current results highlight disproportionate energy growth from multimodal inputs, and the study outlines stage-wise breakdowns, token-driven scaling curves, and prototype controls that motivate future input-aware scheduling policies for energy-efficient multimodal inference.

Best Poster Finalist (BP): no
Poster: PDF
Poster Summary: PDF


Back to Poster Archive Listing