Workshop: PDSW'25: The 10th International Parallel Data Systems Workshop
Authors: Youjia Li (Northwestern University); Robert Latham and Robert Ross (Argonne National Laboratory (ANL)); and Ankit Agrawal, Alok Choudhary, and Wei-keng Liao (Northwestern University)
Abstract: High-level I/O libraries, such as PnetCDF and HDF5, are commonly used by large-scale scientific applications to perform I/O tasks in parallel. These I/O libraries store the metadata of data objects in files along with their raw data. To ensure metadata consistency during parallel data object creation, they require applications to call the metadata APIs collectively using consistent metadata. Such a requirement can result in an expensive consistency check, as its cost increases with the metadata volume and the number of processes. To address this limitation, we propose a new file header format, which uses partitioned metadata blocks to enable independent data object creation and reduce the objects required for consistency check. Our performance evaluation shows that this new design achieves a scalable performance, cutting data object creation times by up to 196x when running on 4096 MPI processes to create 5,684,800 data objects in parallel.
Back to PDSW'25: The 10th International Parallel Data Systems Workshop Archive Listing Back to Full Workshop Archive Listing