FlexCP: FlexAttention for Long Sequence Training...
Lightning Talk: FlexCP: FlexAttention for Long Sequence Training Using Context Parallel - Xilun Wu & Chien-Chin Huang, Meta FlexAttention is a versatile solution that enables arbitrary attention customization without sacrificing performance. It allows users to define a customized attention function and generates a corresponding FlashAttention kernel with competitive performance to handwritten ones. In this talk, we will share our experience of integrating FlexAttention with context parallelism in long sequence training, covering aspects such as API design, user experience, and performance tuning. Our goal is to empower the audience to easily experiment with custom attention settings on long sequences, without compromising on performance.