Poster Type: ACM Student Research Competition, Undergraduate
Author: Ritesh Bhirud (University of Massachusetts Amherst, Massachusetts Institute of Technology (MIT))
Supervisor: Rabab Alomairy (Massachusetts Institute of Technology (MIT))
Abstract: Mixture-of-experts (MoE) architectures enable trillion-parameter models but face prohibitive memory scaling, limited compression interpretability, and vendor-specific implementations hindering heterogeneous HPC deployment.
We present the first Julia-based MoE framework introducing CUR decomposition for interpretable expert compression—a novel approach applying CUR matrix factorization to MoE architectures—with hardware-agnostic design. While SVD-based methods provide effective compression, CUR-MoE offers comparable performance with enhanced interpretability through preserved column/row structure, maintaining viability at high compression ratios (35.29 perplexity at 70% compression). Comprehensive gating evaluation reveals ExpertChoice achieves optimal load balancing. Julia's LLVM compilation enables consistent 5-6× GPU acceleration across NVIDIA, AMD, Intel, and Apple hardware.
Our core implementation is completed, validated on WikiText-2 across platforms. We are expanding comprehensive platform support for Apple Metal and Intel Arc while extending Transformers.jl and Flux.jl integrations. The poster will include visual comparisons, cross-vendor benchmarks, detailed oral explanations, and QR codes with live interactive GitHub examples demonstrating CUR structure preservation.
Best Poster Finalist (BP): no
Poster: PDF
Poster Summary: PDF