The International Conference for High Performance Computing, Networking, Storage, and Analysis

Research and ACM SRC Posters Archive

Leveraging Large Language Models for Property Prediction in Polymorphic Organic Semiconductors


Poster Type: ACM Student Research Competition, Graduate

Author: Shreya Pagaria (Carnegie Mellon University, Pittsburgh Supercomputing Center), Mei-Yu Wang (Pittsburgh Supercomputing Center), Dana O’Connor (Pittsburgh Supercomputing Center), Julian Uran (Pittsburgh Supercomputing Center), Paola Buitrago (Pittsburgh Supercomputing Center)

Supervisor: Paola Buitrago (Pittsburgh Supercomputing Center)

Abstract: Organic semiconductors (OSCs) are promising for next-generation electronics, but polymorphism complicates accurate property prediction and makes traditional methods costly. We investigate transformer-based large language models (LLMs) for predicting energy gaps in polymorphic OSC crystals. A Pegasus-managed workflow is deployed across heterogeneous hardware (PSC Bridges-2 and Neocortex Cerebras CS-2) to evaluate three crystal text encodings: Materials String, SLICES, and SLICES-PLUS against a baseline XGBoost Regressor model. The results show that the LLM-analyzed Materials String achieves the highest accuracy, particularly in polymorph-rich datasets, outperforming other representations in both pretraining efficiency and downstream tasks, as well as the baseline XGBoost results. These findings highlight the potential of LLM-driven crystal encodings to accelerate materials discovery and enable the scalable, data-driven design of organic semiconductors.

Best Poster Finalist (BP): no
Poster: PDF
Poster Summary: PDF


Back to Poster Archive Listing