Olmo-Thinking: Training a Fully Open Reasoning Model
Keynote: Olmo-Thinking: Training a Fully Open Reasoning Model - Nathan Lambert, Senior Research Scientist, Ai2 This talk covers the crucial details it takes to train a 7B parameter, fully open reasoning model, Olmo-Thinking, to rival Qwen 3, highlighting fresh results, trade-offs, and methods across midtraining, distillation with high-quality thinking SFT data, and reinforcement learning with verifiable rewards. This talk focuses on aspects of the training process, such as model architecture decisions, data sourcing, and training code design that is often not shared by leading models and can enable a resurgence of research with advancements in reinforcement learning, tool-use, and inference-time scaling.