Our Journey With TorchTitan
Our Journey With TorchTitan - Linsong Chu & Garrett Goon, IBM Research In this session, we will share our journey with TorchTitan over the past year and a half, starting from early 2024. During this journey, we went from using TorchTitan as a secondary codebase solely for throughput benchmarking to leveraging it for several internal production trainings; from being an end user to becoming an active contributor within the TorchTitan community. Our story will cover why we adopted TorchTitan in our production trainings, what we’ve accomplished with it, and what lies ahead. Highlights include training an in-house 70B model earlier this year that matches the performance of the LLaMA 3 family - while requiring significantly fewer GPU hours - thanks to the latest features such as FP8 training. We’ll also discuss our current work with TorchTitan, including our ongoing MoE training enabled by integrating our fast MoE kernel into TorchTitan, as well as exploring additional MoE kernels with FP8 row-wise and MXFP8, which are currently being developed within the TorchTitan community. We’ll also share key lessons learned along the way and explain why we think this is a great community for everyone to explore and contribute to.