Optimizing Model Inference with PyTorch 2.0
Sponsored Session: Lightning Talk: Optimizing Model Inference with PyTorch 2.0 - Devansh Ghatak, Simplismart This session will explore how to maximize inference performance in PyTorch 2.0 by combining dynamic compilation and CUDA graph capture techniques. We will cover practical strategies including Quantization, Ahead-of-Time (AOT) compilation, and the use of custom fused operators all of which are essential tools for achieving low-latency, production-grade deployments.