PyTorch Symmetric Memory: A New Programming Paradigm for Distributed AI
PyTorch Symmetric Memory: A New Programming Paradigm for Distributed AI - Ke Wen & Chien-Chin Huang, Meta Recent advancements in models led by DeepSeek have highlighted the need for customized communication. In response, PyTorch introduces Symmetric Memory, a new distributed programming model that creates a global address space for data spanning multiple GPUs’ memory. This makes fine-grained GPU-initiated remote access possible. In this talk, we will demonstrate how developers can author their own communication kernels at the device level. Additionally, we will show how to interleave communication and computation within the same kernel using popular languages like Triton, achieving the finest-grained fusion. Furthermore, we will discuss how these capabilities can integrate with the torch.compile ecosystem. We will provide concrete examples based on the all-to-all-v used in MoE models, fused communication + layer norm, and masked-aware communication driven by FlexAttention.