An Open Source Post-Training Stack: Kubernetes + Ray + PyTorch + vLLM
An Open Source Post-Training Stack: Kubernetes + Ray + PyTorch + vLLM - Robert Nishihara, Anyscale AI workloads require increasing scale for both compute and data, as well as significant heterogeneity across workloads, models, data types, and hardware accelerators. As a consequence, the software stack for running compute-intensive AI workloads is fragmented and rapidly evolving. Companies that productionize AI end up building large AI platform teams to manage these workloads. However, within the fragmented landscape, common patterns are beginning to emerge. This talk describes a popular software stack combining Kubernetes, Ray, PyTorch, and vLLM. It describes the role of each of these frameworks, how they operate together, and illustrates this combination with case studies from Pinterest, Uber, and Roblox as well as from today’s most popular post-training frameworks.