Poster Type: ACM Student Research Competition, Graduate
Author: Gunwoo Kim (University of California, Davis), Alex Sim (ESnet; Lawrence Berkeley National Laboratory (LBNL)), Kesheng Wu (ESnet; Lawrence Berkeley National Laboratory (LBNL))
Supervisor: Alex Sim (ESnet; Lawrence Berkeley National Laboratory (LBNL))
Abstract: In high energy physics (HEP), large-scale experiments produce enormous data volumes that are distributed across global storage systems. To reduce redundant transfers and improve efficiency, disk caching systems such as XCache are deployed, but their effectiveness depends on good caching policy. Our research asks: can we find patterns and reliably predict dataset popularity? This work investigates dataset-level “pinning,” where sets of files are retained in cache to improve hit rates. We explore the use of Hawkes processes, a statistical model, to model bursty, event-driven dataset popularity, a novel approach compared to previous efforts. Preliminary results suggest this framework improves predictability of future access patterns, thereby guiding more effective caching strategies. The poster will present our methodology, experimental setup, and early evaluation results, highlighting both the promise and current limitations of this approach.
Best Poster Finalist (BP): no
Poster: PDF
Poster Summary: PDF