SC25 Proceedings

Birds of a Feather Archive

Building AI Data Commons and AI Data Meshes: Collaborative Approaches for Scalable, Responsible, Distributed, and Federated AI

Authors: Robert Grossman (University of Chicago, Open Commons Consortium), Christine Kirkpatrick (San Diego Supercomputer Center (SDSC)), Michael Lukowski (University of Chicago)

Abstract: This BoF is a collaborative discussion on architecting and deploying AI data commons and AI data meshes to support scalable, responsible, and federated AI. Focusing on minimal, interoperable architectures, it aims to empower approaches building small to midscale AI models, highlight challenges and opportunities in federating public and private data commons, and accelerate community adoption of best practices. Key topics include core services, embedding architectures, secure federation, and agentic orchestration. The session seeks to foster a roadmap for the community, exchange best practices, and explore the potential for establishing a working group to advance AI data infrastructure standards.

Long Description: This Birds of a Feather session invites the SC-25 community to engage in a collaborative discussion on architecting and deploying AI data commons and AI data meshes. A data commons is a data platform for managing, analyzing, and sharing data with a research community. A data mesh is created when two or more data commons or cloud-based computing resources can interoperate. Building on recent advances and deployments of data commons and data meshes our session aims to:

• Illuminate core concepts in AI data commons and AI data meshes, highlighting minimal, interoperable architectures that empower small to midscale AI training and inferencing over data commons and federating multiple AI commons into AI meshes. • Identify real-world challenges and opportunities in building and federating AI data commons—both public and private—for research and enterprise applications. • Foster a broader community dialogue to accelerate the adoption of AI commons and meshes

Context & Relevance to SC-25:

While large-scale AI models (“frontier models”) have dominated headlines, much of the innovation—and need—lies in supporting small to midscale AI: models trained on 4–12 GPUs, which are more accessible to scientific, commercial, and government organizations. AI commons and meshes empower organizations and user communities to:

• Build and share high-quality, domain-specific weights & models without sharing data • Federate secure, compliant data & models across different organizations when data often cannot leave organizations • Outperform “frontier” AI through richer, better-curated, interoperable resources.

This is particularly timely for the SC-25 audience, as organizations increasingly need to support both large, centralized workloads and distributed, federated AI research, often over sensitive or proprietary data.

Key Topics & Discussion Areas:

• Minimal, Interoperable Core Services for AI Commons: What are the smallest set of APIs/services (e.g., FAIR data, embedding APIs, model training/inference, compute orchestration) needed for sustainable and evolvable AI commons? • Embedding Architectures: How can vector stores and pre-computed or multi-modal embeddings accelerate research and reduce compute demands. • Federation and Mesh Architectures: Lessons from scientific data commons, frameworks for securely federating AI commons, and community-driven pillars for building data meshes. • Supporting Public vs. Private Commons: Technical and compliance challenges in bridging public research data and in-house/private organizational data. • Agentic Interfaces and AI Orchestration: The role of agent-driven workflows and orchestration protocols (e.g., Model Context Protocol) in AI commons and meshes.

Expected Outcomes:

• Community Roadmap: Capture key priorities, pain points, and opportunities for collaborative development, standard-setting, and shared infrastructure. • Best Practices Exchange: Identify actionable lessons learned around data curation, embedding generation, interoperability, and compliance from both practitioners and platform developers. • Explore Working Group Formation: Gauge interest in forming an ongoing community of practice to drive progress and influence standards for AI-focused data commons and data meshes.

Who Should Attend:

• Data commons and AI architects and operators interested in the AI research infrastructure. • AI/ML researchers and practitioners with responsibility for building AI models over organizational data • Academic, enterprise & public sector leaders developing AI models over organizational data that cannot leave the organization.

Back to Birds of a Feather Archive Listing