Sponsored Session: Accelerating GenAI Inference: From AWS Deep Learning Containers to Scaling Amazon Rufus on Trainium - Phi Nguyen & Adam Zhao, AWS The rapid evolution of LLMs and reasoning models demands an equally sophisticated infrastructure stack. In this technical deep dive, we’ll explore how AWS’s comprehensive infrastructure building blocks—spanning compute, networking, and storage—empower open-source innovation and enable production-scale AI deployments. We’ll begin by examining the modern AI infrastructure stack on AWS and how its modular components support the latest innovations in large language models and reasoning systems. You’ll learn how to leverage AWS’s purpose-built infrastructure to accelerate open-source AI development, from training to inference, while maintaining flexibility and cost-efficiency. Then, we’ll ground these concepts in a real-world case study: Amazon’s Rufus shopping assistant, which operates one of the largest production LLM inference deployments in the world—serving hundreds of millions of customers during peak events like Prime Day. This session reveals the architectural decisions and engineering practices behind Rufus’s highly scalable inference cluster, including:

Accelerating GenAI Inference: From AWS Deep Learning...