🎥 Recorded live at the MLOps World | GenAI Summit 2025 — Austin, TX (October 8, 2025). Session Title: How to Train Your Agent: Building Reliable Agents with Reinforcement Learning Speaker: Kyle Corbitt, Co-Founder & CEO, OpenPipe Talk Track: Agents in Production Abstract: Have you ever launched an impressive agentic demo, only to find that no amount of prompting could make it reliable enough for production? You’re not alone — agent reliability remains one of the toughest challenges in deploying AI systems that actually work. In this session, Kyle Corbitt shares how Group Relative Preference Optimization (GRPO) can dramatically improve agent reliability by teaching models to learn from both successes and failures over time. Through real-world case studies, including an email assistant whose success rate jumped from 74% to 94% after RL optimization, Kyle reveals how reinforcement learning can bridge the gap between prototype and production-ready agents. He’ll also share lessons from deploying these techniques at DoorDash and other companies, covering both what works and what pitfalls to avoid — all using open-source tools, not proprietary platforms. What you’ll learn: • How to apply reinforcement learning (GRPO) to improve agent reliability • Real-world lessons from deploying RL-powered agents to production • Open-source tools and frameworks for fine-tuning models with RL • Common pitfalls and how to avoid them when scaling agent training.

How to Train Your Agent: Building Reliable Agents with RL