The International Conference for High Performance Computing, Networking, Storage, and Analysis

Workshops Archive

Bridging Speed and Optimality in Job Scheduling: A Hybrid Ant Colony Optimization Approach for Distributed Systems


Workshop: WORKS 2025: 20th Workshop on Workflows in Support of Large-Scale Science

Authors: Hongwei Jin (Argonne National Laboratory (ANL)); Pawel Zuk (University of Southern California (USC)); Krishnan Raghavan (Argonne National Laboratory (ANL)); Prachi Jadhav (University of Tennessee, Knoxville); Aiden Hamade (University of Kentucky); Ewa Deelman (University of Southern California (USC)); and Prasanna Balaprakash (Oak Ridge National Laboratory (ORNL))

Abstract: Efficient job scheduling in distributed systems faces exponential complexity growth as systems scale. While queue-based methods (e.g., FIFO) generate schedules rapidly but suboptimally, optimization tools achieve higher quality at significant computational cost. We propose a hybrid ant colony optimization (HACO) algorithm bridging this gap. HACO uses queue-based warm-start initialization for pheromone levels, constructs disjunctive graphs modeling precedence and resource constraints, and applies parallel local search on selected subgraphs to escape local optima. Our approach combines the speed of heuristics with optimization quality through strategic pheromone updates and OR-Tools integration. Experimental evaluation on job shop scheduling (JSSP), flexible job shop (FJSP), and synthetic large-scale problems demonstrates 3-5\% deviation from optimality with 5-10x speedup over state-of-the-art solvers. Results show consistent performance across varying problem scales, making HACO compelling for large-scale distributed scheduling where computational efficiency is critical.


Back to WORKS 2025: 20th Workshop on Workflows in Support of Large-Scale Science Archive Listing Back to Full Workshop Archive Listing