Maximizing Luck in Reinforcement Learning
Maximizing Luck in Reinforcement Learning - Daniel Han, Unsloth How do we maximize “luck” in reinforcement learning? Will RL finally take us to AGI and super-intelligence? What is RLVR, PPO, GRPO, Dr. GRPO, GAPO, DAPO? What are good and bad reward functions? How do we design them? How do we make RL training go fast and be memory efficient? How can quantization & infra optimizations help in speeding up RL? Will open source + RL win over closed source models? Come join me in this session to hear the answers to these questions, and more.