SC25 Proceedings

Workshops Archive

Evaluation of Test-Time Compute Constraints on Safety and Skill Large Reasoning Models

Workshop: Frontiers in Generative AI for HPC Science and Engineering: Foundations, Challenges, and Opportunities

Authors: Adarsha Balaji, Le Chen, Rajeev Thakur, Franck Cappello, and Sandeep Madireddy (Argonne National Laboratory (ANL))

Abstract: Test-time compute scaling has demonstrated the ability to improve the performance of reasoning language models by generating longer chain-of-thought (CoT) sequences. However, this increase in performance comes with a significant increase in computation cost. In this work, we investigate two compute constraint strategies: (1) reasoning length constraint and (2) model quantization, and study their impact on the safety performance of reasoning models. Specifically, we explore two approaches to apply compute constraints to reasoning models: (1) fine-tuning reasoning models using a length-controlled policy optimization (LCPO) based reinforcement learning method to satisfy a user-defined CoT reasoning length, and (2) applying quantization to maximize the generation of CoT sequences within a user-defined compute constraint. Furthermore, we study the trade-off between the computational efficiency and the safety of the model.

Back to Frontiers in Generative AI for HPC Science and Engineering: Foundations, Challenges, and Opportunities Archive Listing Back to Full Workshop Archive Listing