Blazing Fast GenAI Inference With Torch.compile
Blazing Fast GenAI Inference With Torch.compile - Richard Zou, Meta This talk dives into Generative AI (GenAI) inference, highlighting key features of torch.compile that make it well suited for this rapidly evolving field. We’ll explore how torch.compile enables efficient and scalable inference for large language models with features like precompilation, multigraph dynamic shapes, CUDAGraphs, and FlexAttention, and discuss our progress in integrations with leading open-source (OSS) GenAI frameworks (e.g. vLLM, SGLang, HuggingFace).